Selective amplification using blocking oligonucleotides

ABSTRACT

Disclosed herein include methods and compositions for selectively amplifying and/or extending nucleic acid target molecules in a sample. The methods and compositions can, for example, reduce the amplification and/or extension of undesirable nucleic acid species in the sample, and/or allow selective removal of undesirable nucleic acid species in the sample.

RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 15/875,816, filed on Jan. 19, 2018, which claims priority under35 U.S.C. § 119(e) to U.S. Provisional Application No. 62/453,163, filedon Feb. 1, 2017. The content of these related applications is expresslyincorporated herein by reference in its entirety.

BACKGROUND

The expression level of different genes can vary significantly in abiological sample. For examples, some broad categories of geneexpression are: 1) “high expressers” which are comprised of 5-10 genesthat dominate ˜20% of cellular mRNAs; 2) “intermediate expressers” thatare comprised of 50-200 genes that occupy 40-60% of cellular mRNAs; and3) “moderate expressers” that are comprised of 10,000-20,000 genes thatoccupy the rest of the cellular mRNA fraction. One challenge inmolecular biology and molecular genetics is to capture this highlydynamic gene expression profile efficiently and accurately in order todistinguish different cell types and phenotypes in the sample.

In recent years, next generation sequencing (NGS) has provided a highthroughput method in assessing gene expression profiles. During librarypreparation for NGS, a sample with heterogeneous cDNA species isamplified by PCR to obtain adequate sample amount and to attachNGS-compatible adapters. The sequencing process captures the number ofreads for each gene from the PCR-amplified library sample to interpretthe gene expression level. However, since different genes are expressedat a large range of levels, PCR amplification can skew the native geneexpression. For example, a gene has 1 molecule of cDNA would require 40cycles of PCR to achieve the same representative amount as a gene with1000 molecules of cDNA in 30 cycles. In a heterogeneous cDNA sample, PCRis usually performed in excess cycles to adequately amplify lowexpressers; in those scenarios, the native gene expression profile isusually skewed by the dominating high expresser PCR products. A methodto correct for such a bias in PCR product is Molecular Indexing;however, high expressers such as ribosomal protein mRNAs, mitochondrialmRNAs, or housekeeping genes often dominate the sequencing run withlittle contribution to the experimental interpretation. There is a needfor selectively amplifying sequences of interest.

SUMMARY

Disclosed herein is a method of selective amplification. In someembodiments, the method comprises: providing a sample comprising aplurality of nucleic acid target molecules and one or more undesirablenucleic acid species; providing a plurality of oligonucleotide probes,wherein each of the plurality of oligonucleotide probes comprises amolecular label sequence and a binding region; contacting the pluralityof oligonucleotide probes with the plurality of nucleic acid targetmolecules for hybridization; extending oligonucleotide probes that arehybridized to the plurality of nucleic acid target molecules to generatea plurality of extension products; providing a blocking oligonucleotidethat specifically binds to at least one of the one or more undesirablenucleic acid species; and amplifying the plurality of extension productsto generate a plurality of amplicons, whereby the amplification or theextension of the undesirable nucleic acid species is reduced by theblocking oligonucleotide. Also disclosed herein is a method of selectiveextension. In some embodiments, the method comprises: providing a samplecomprising a plurality of nucleic acid target molecules and one or moreundesirable nucleic acid species; providing a plurality ofoligonucleotide probes, wherein each of the plurality of oligonucleotideprobes comprises a molecular label sequence and a binding region;contacting the plurality of oligonucleotide probes with the plurality ofnucleic acid target molecules for hybridization; providing a blockingoligonucleotide that specifically binds to at least one of the one ormore undesirable nucleic acid species; and extending oligonucleotideprobes that are hybridized to the plurality of nucleic acid targetmolecules to generate a plurality of extension products; whereby theextension of the undesirable nucleic acid species is reduced by theblocking oligonucleotide.

In the methods and compositions disclosed herein, the blockingoligonucleotide can be, for example, a locked nucleic acid (LNA), apeptide nucleic acid (PNA), a DNA, an LNA/PNA chimera, an LNA/DNAchimera, or a PNA/DNA chimera. In some embodiments, the methods compriseproviding blocking oligonucleotides that specifically binds to two ormore undesirable nucleic acid species in the sample. In someembodiments, the methods comprise providing blocking oligonucleotidesthat specifically binds to at least 10 undesirable nucleic acid speciesin the sample. In some embodiments, the methods comprise providingblocking oligonucleotides that specifically binds to at least 100undesirable nucleic acid species in the sample.

The blocking oligonucleotide can have a Tm of at least 60° C., a Tm ofat least 65° C., or a Tm of at least 70° C. In some embodiments, theblocking oligonucleotide is unable to function as a primer for a reversetranscriptase or a polymerase. In some embodiments, the amplification orthe extension of the undesirable nucleic acid species is reduced by atleast 50%. In some embodiments, the amplification or the extension ofthe undesirable nucleic acid species is reduced by at least 80%. In someembodiments, the amplification or the extension of the undesirablenucleic acid species is reduced by at least 90%. In some embodiments,the amplification or the extension of the undesirable nucleic acidspecies is reduced by at least 95%. In some embodiments, theamplification or the extension of the undesirable nucleic acid speciesis reduced by at least 99%.

In some embodiments, the blocking oligonucleotide is 10 nt to 50 ntlong. In some embodiments, the blocking oligonucleotide is 20 nt to 30nt long. In some embodiments, the blocking oligonucleotide is about 25nt long.

In some embodiments, the one or more undesirable nucleic acid speciesamounts to about 50% of the nucleic acid content of the sample. In someembodiments, the one or more undesirable nucleic acid species amounts toabout 60% of the nucleic acid content of the sample. In someembodiments, the one or more undesirable nucleic acid species amounts toabout 70% of the nucleic acid content of the sample. In someembodiments, the one or more undesirable nucleic acid species amounts toabout 80% of the nucleic acid content of the sample.

In some embodiments, the undesirable nucleic acid species is selectedfrom the group consisting of rRNA, mtRNA, genomic DNA, intronicsequence, high abundance sequence, and any combination thereof. In someembodiments, the blocking oligonucleotides specifically bind to within100 nt of the 3′ end of the one or more undesirable nucleic acidspecies. In some embodiments, the blocking oligonucleotides specificallybind to within 100 nt of the 5′ end of the one or more undesirablenucleic acid species. In some embodiments, the blocking oligonucleotidesspecifically bind to within 100 nt of the middle of the one or moreundesirable nucleic acid species.

In some embodiments, the methods further comprise removing thehybridized complex formed between the blocking oligonucleotide and theundesirable nucleic acid species. In some embodiments, the removingcomprises immobilizing the hybridized complex formed between theblocking oligonucleotide and the undesirable nucleic acid species on asolid support. In some embodiments, the blocking oligonucleotidecomprises an affinity moiety. In some embodiments, solid supportcomprises a binding partner of the affinity moiety. In some embodiments,the affinity moiety is a functional group selected from the groupconsisting of biotin, streptavidin, heparin, an aptamer, aclick-chemistry moiety, digoxigenin, primary amine(s), carboxyl(s),hydroxyl(s), aldehyde(s), ketone(s), and any combination thereof. Insome embodiments, the affinity moiety is biotin. In some embodiments,the solid support comprises streptavidin.

In some embodiments, the amplifying comprises PCR amplification of theplurality of extension products. In some embodiments, each of theplurality of oligonucleotide probes comprises a cell label sequence, asample label sequence, a location label sequence, a binding site for auniversal primer, or any combination thereof. In some embodiments, theplurality of oligonucleotide probes comprises at least 100 differentmolecular label sequences. In some embodiments, the plurality ofoligonucleotide probes comprises at least 1,000 different molecularlabel sequences. In some embodiments, the plurality of oligonucleotideprobes comprises at least 10,000 different molecular label sequences. Insome embodiments, the plurality of oligonucleotide probes comprises thesame cell label sequence. In some embodiments, the plurality ofamplicons comprises a cDNA library. In some embodiments, the samplecomprises a single cell, a plurality of cells, a tissue sample, or anycombination thereof. In some embodiments, the sample is a single cell.

In some embodiments, the methods further comprise sequencing theplurality of amplicons. In some embodiments, the undesirable nucleicacid species represents less than 50% of the plurality of amplicons. Insome embodiments, the undesirable nucleic acid species represents lessthan 20% of the plurality of amplicons. In some embodiments, theundesirable nucleic acid species represents less than 10% of theplurality of amplicons. In some embodiments, the undesirable nucleicacid species represents less than 5% of the plurality of amplicons. Insome embodiments, the plurality of oligonucleotide probes is immobilizedon a substrate. In some embodiments, the substrate is a particle. Insome embodiments, the substrate is a bead. In some embodiments, theplurality of nucleic acid target molecules comprises mRNA targetmolecules. In some embodiments, the binding region comprises poly-dTsequence.

Also disclosed herein is a kit for selective amplification of aplurality of nucleic acid target molecules in a sample. In someembodiments, the kit comprises: a plurality of oligonucleotide probes,wherein each of the plurality of oligonucleotide probes comprises amolecular label sequence and a binding region; and a plurality ofblocking oligonucleotides that specifically binds to a plurality ofundesirable nucleic acid species in the sample, wherein each blockingoligonucleotide probe is unable to function as a primer for a reversetranscriptase or a polymerase.

In some embodiments, the blocking oligonucleotide is a locked nucleicacid (LNA), a peptide nucleic acid (PNA), a DNA, an LNA/PNA chimera, anLNA/DNA chimera, or a PNA/DNA chimera. In some embodiments, the kitcomprises blocking oligonucleotides that specifically binds to two ormore undesirable nucleic acid species. In some embodiments, the kitcomprises blocking oligonucleotides that specifically binds to at least10 undesirable nucleic acid species. In some embodiments, the kitcomprises blocking oligonucleotides that specifically binds to at least100 undesirable nucleic acid species.

In some embodiments, the blocking oligonucleotide is 10 nt to 50 ntlong. In some embodiments, the blocking oligonucleotide is 20 nt to 30nt long. In some embodiments, the blocking oligonucleotide is about 25nt long. In some embodiments, the undesirable nucleic acid species isselected from the group consisting of rRNA, mtRNA, genomic DNA, intronicsequence, high abundance sequence, and any combination thereof.

In some embodiments, the blocking oligonucleotides specifically bind towithin 100 nt of the 3′ end of the undesirable nucleic acid species. Insome embodiments, the blocking oligonucleotides specifically bind towithin 100 nt of the 5′ end of the undesirable nucleic acid species. Insome embodiments, the blocking oligonucleotides specifically bind towithin 100 nt of the middle of the undesirable nucleic acid species. Insome embodiments, the blocking oligonucleotide comprises an affinitymoiety.

In some embodiments, each of the plurality of oligonucleotide probescomprises a cell label sequence, a sample label sequence, a locationlabel sequence, a binding site for a universal primer, or anycombination thereof. In some embodiments, the binding region comprisespoly-dT. In some embodiments, the plurality of oligonucleotide probes isimmobilized on a substrate. In some embodiments, the substrate is aparticle, for example a bead.

In some embodiments, the kit further comprises an enzyme. The enzyme canbe, for example, a reverse transcriptase, a polymerase, a ligase, anuclease, and any combination thereof. In some embodiments, eachblocking oligonucleotide probe has a Tm of at least 60° C. In someembodiments, the plurality of oligonucleotide probes comprises at least100 different molecular label sequences. In some embodiments, theplurality of oligonucleotide probes comprises at least 1,000 differentmolecular label sequences. In some embodiments, the plurality ofoligonucleotide probes comprises at least 10,000 different molecularlabel sequences. In some embodiments, the plurality of oligonucleotideprobes comprises the same cell label sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic illustration of an exemplary method forselective amplification using blocking oligonucleotides thatspecifically bind to undesirable nucleic acid species.

DETAILED DESCRIPTION Definitions

Unless otherwise defined, all technical terms used herein have the samemeaning as commonly understood by one of ordinary skill in the art inthe field to which this disclosure belongs. As used in thisspecification and the appended claims, the singular forms “a,” “an,” and“the” include plural references unless the context clearly dictatesotherwise. Any reference to “or” herein is intended to encompass“and/or” unless otherwise stated.

As used herein the term “associated” or “associated with” can mean thattwo or more species are identifiable as being co-located at a point intime. An association can mean that two or more species are or werewithin a similar container. An association can be an informaticsassociation, where for example digital information regarding two or morespecies is stored and can be used to determine that one or more of thespecies were co-located at a point in time. An association can also be aphysical association. In some instances two or more associated speciesare “tethered”, “attached”, or “immobilized” to one another or to acommon solid or semisolid surface. An association may refer to covalentor non-covalent means for attaching labels to solid or semi-solidsupports such as beads. An association may comprise hybridizationbetween a target and a label.

As used herein, the term “complementary” can refer to the capacity forprecise pairing between two nucleotides. For example, if a nucleotide ata given position of a nucleic acid is capable of hydrogen bonding with anucleotide of another nucleic acid, then the two nucleic acids areconsidered to be complementary to one another at that position.Complementarity between two single-stranded nucleic acid molecules maybe “partial,” in which only some of the nucleotides bind, or it may becomplete when total complementarity exists between the single-strandedmolecules. A first nucleotide sequence can be said to be the“complement” of a second sequence if the first nucleotide sequence iscomplementary to the second nucleotide sequence. A first nucleotidesequence can be said to be the “reverse complement” of a secondsequence, if the first nucleotide sequence is complementary to asequence that is the reverse (i.e., the order of the nucleotides isreversed) of the second sequence. As used herein, the terms“complement”, “complementary”, and “reverse complement” can be usedinterchangeably. It is understood from the disclosure that if a moleculecan hybridize to another molecule it may be the complement of themolecule that is hybridizing.

As used herein, the term “digital counting” can refer to a method forestimating a number of target molecules in a sample. Digital countingcan include the step of determining a number of unique labels that havebeen associated with targets in a sample. This stochastic methodologytransforms the problem of counting molecules from one of locating andidentifying identical molecules to a series of yes/no digital questionsregarding detection of a set of predefined labels.

As used herein, the term “label” or “labels” can refer to nucleic acidcodes associated with a target within a sample. A label can be, forexample, a nucleic acid label. A label can be an entirely or partiallyamplifiable label. A label can be entirely or partially sequencablelabel. A label can be a portion of a native nucleic acid that isidentifiable as distinct. A label can be a known sequence. A label cancomprise a junction of nucleic acid sequences, for example a junction ofa native and non-native sequence. As used herein, the term “label” canbe used interchangeably with the terms, “index”, “tag,” or “label-tag.”Labels can convey information. For example, in various embodiments,labels can be used to determine an identity of a sample, a source of asample, an identity of a cell, and/or a target.

As used herein, a “nucleic acid” can generally refer to a polynucleotidesequence, or fragment thereof. A nucleic acid can comprise nucleotides.A nucleic acid can be exogenous or endogenous to a cell. A nucleic acidcan exist in a cell-free environment. A nucleic acid can be a gene orfragment thereof. A nucleic acid can be DNA. A nucleic acid can be RNA.A nucleic acid can comprise one or more analogs (e.g. altered backgone,sugar, or nucleobase). Some non-limiting examples of analogs include:5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos,locked nucleic acids, glycol nucleic acids, threose nucleic acids,dideoxynucleotides, cordycepin, 7-deaza-GTP, florophores (e.g. rhodamineor flurescein linked to the sugar), thiol containing nucleotides, biotinlinked nucleotides, fluorescent base analogs, CpG islands,methyl-7-guanosine, methylated nucleotides, inosine, thiouridine,pseudourdine, dihydrouridine, queuosine, and wyosine. “Nucleic acid”,“polynucleotide, “target polynucleotide”, and “target nucleic acid” canbe used interchangeably.

A nucleic acid can comprise one or more modifications (e.g., a basemodification, a backbone modification), to provide the nucleic acid witha new or enhanced feature (e.g., improved stability). A nucleic acid cancomprise a nucleic acid affinity tag. A nucleoside can be a base-sugarcombination. The base portion of the nucleoside can be a heterocyclicbase. The two most common classes of such heterocyclic bases are thepurines and the pyrimidines. Nucleotides can be nucleosides that furtherinclude a phosphate group covalently linked to the sugar portion of thenucleoside. For those nucleosides that include a pentofuranosyl sugar,the phosphate group can be linked to the 2′, the 3′, or the 5′ hydroxylmoiety of the sugar. In forming nucleic acids, the phosphate groups cancovalently link adjacent nucleosides to one another to form a linearpolymeric compound. In turn, the respective ends of this linearpolymeric compound can be further joined to form a circular compound;however, linear compounds are generally suitable. In addition, linearcompounds may have internal nucleotide base complementarity and maytherefore fold in a manner as to produce a fully or partiallydouble-stranded compound. Within nucleic acids, the phosphate groups cancommonly be referred to as forming the internucleoside backbone of thenucleic acid. The linkage or backbone of the nucleic acid can be a 3′ to5′ phosphodiester linkage.

A nucleic acid can comprise a modified backbone and/or modifiedinternucleoside linkages. Modified backbones can include those thatretain a phosphorus atom in the backbone and those that do not have aphosphorus atom in the backbone. Suitable modified nucleic acidbackbones containing a phosphorus atom therein can include, for example,phosphorothioates, chiral phosphorothioates, phosphorodithioates,phosphotriesters, aminoalkylphosphotriesters, methyl and other alkylphosphonates such as 3′-alkylene phosphonates, 5′-alkylene phosphonates,chiral phosphonates, phosphinates, phosphoramidates including 3′-aminophosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates,thionophosphoramidates, thionoalkylphosphonates,thionoalkylphosphotriesters, selenophosphates, and boranophosphateshaving normal 3′-5′ linkages, 2′-5′ linked analogs, and those havinginverted polarity wherein one or more internucleotide linkages is a 3′to 3′, a 5′ to 5′ or a 2′ to 2′ linkage.

A nucleic acid can comprise polynucleotide backbones that are formed byshort chain alkyl or cycloalkyl internucleoside linkages, mixedheteroatom and alkyl or cycloalkyl internucleoside linkages, or one ormore short chain heteroatomic or heterocyclic internucleoside linkages.These can include those having morpholino linkages (formed in part fromthe sugar portion of a nucleoside); siloxane backbones; sulfide,sulfoxide and sulfone backbones; formacetyl and thioformacetylbackbones; methylene formacetyl and thioformacetyl backbones; riboacetylbackbones; alkene containing backbones; sulfamate backbones;methyleneimino and methylenehydrazino backbones; sulfonate andsulfonamide backbones; amide backbones; and others having mixed N, O, Sand CH₂ component parts.

A nucleic acid can comprise a nucleic acid mimetic. The term “mimetic”can be intended to include polynucleotides wherein only the furanosering or both the furanose ring and the internucleotide linkage arereplaced with non-furanose groups, replacement of only the furanose ringcan also be referred as being a sugar surrogate. The heterocyclic basemoiety or a modified heterocyclic base moiety can be maintained forhybridization with an appropriate target nucleic acid. One such nucleicacid can be a peptide nucleic acid (PNA). In a PNA, the sugar-backboneof a polynucleotide can be replaced with an amide containing backbone,in particular an aminoethylglycine backbone. The nucleotides can beretained and are bound directly or indirectly to aza nitrogen atoms ofthe amide portion of the backbone. The backbone in PNA compounds cancomprise two or more linked aminoethylglycine units which gives PNA anamide containing backbone. The heterocyclic base moieties can be bounddirectly or indirectly to aza nitrogen atoms of the amide portion of thebackbone.

A nucleic acid can comprise a morpholino backbone structure. Forexample, a nucleic acid can comprise a 6-membered morpholino ring inplace of a ribose ring. In some of these embodiments, aphosphorodiamidate or other non-phosphodiester internucleoside linkagecan replace a phosphodiester linkage.

A nucleic acid can comprise linked morpholino units (i.e. morpholinonucleic acid) having heterocyclic bases attached to the morpholino ring.Linking groups can link the morpholino monomeric units in a morpholinonucleic acid. Non-ionic morpholino-based oligomeric compounds can haveless undesired interactions with cellular proteins. Morpholino-basedpolynucleotides can be nonionic mimics of nucleic acids. A variety ofcompounds within the morpholino class can be joined using differentlinking groups. A further class of polynucleotide mimetic can bereferred to as cyclohexenyl nucleic acids (CeNA). The furanose ringnormally present in a nucleic acid molecule can be replaced with acyclohexenyl ring. CeNA DMT protected phosphoramidite monomers can beprepared and used for oligomeric compound synthesis usingphosphoramidite chemistry. The incorporation of CeNA monomers into anucleic acid chain can increase the stability of a DNA/RNA hybrid. CeNAoligoadenylates can form complexes with nucleic acid complements withsimilar stability to the native complexes. A further modification caninclude Locked Nucleic Acids (LNAs) in which the 2′-hydroxyl group islinked to the 4′ carbon atom of the sugar ring thereby forming a2′-C,4′-C-oxymethylene linkage thereby forming a bicyclic sugar moiety.The linkage can be a methylene (—CH2-), group bridging the 2′ oxygenatom and the 4′ carbon atom wherein n is 1 or 2. LNA and LNA analogs candisplay very high duplex thermal stabilities with complementary nucleicacid (Tm=+3 to +10° C.), stability towards 3′-exonucleolytic degradationand good solubility properties.

A nucleic acid may also include nucleobase (often referred to simply as“base”) modifications or substitutions. As used herein, “unmodified” or“natural” nucleobases can include the purine bases, (e.g. adenine (A)and guanine (G)), and the pyrimidine bases, (e.g. thymine (T), cytosine(C) and uracil (U)). Modified nucleobases can include other syntheticand natural nucleobases such as 5-methylcytosine (5-me-C),5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine,6-methyl and other alkyl derivatives of adenine and guanine, 2-propyland other alkyl derivatives of adenine and guanine, 2-thiouracil,2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl(—C═C—CH3) uracil and cytosine and other alkynyl derivatives ofpyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil(pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl,8-hydroxyl and other 8-substituted adenines and guanines, 5-haloparticularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracilsand cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine,2-aminoadenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Modifiednucleobases can include tricyclic pyrimidines such as phenoxazinecytidine(1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazinecytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps suchas a substituted phenoxazine cytidine (e.g.9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one),carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindolecytidine (Hpyrido(3′,′:4,5)pyrrolo[2,3-d]pyrimidin-2-one).

As used herein, the term “sample” can refer to a composition comprisingtargets. Suitable samples for analysis by the disclosed methods,devices, and systems include cells, single cells, tissues, organs, ororganisms.

As used herein, the term “sampling device” or “device” can refer to adevice which may take a section of a sample and/or place the section ona substrate. A sample device can refer to, for example, an fluorescenceactivated cell sorting (FACS) machine, a cell sorter machine, a biopsyneedle, a biopsy device, a tissue sectioning device, a microfluidicdevice, a blade grid, and/or a microtome.

As used herein, the term “solid support” can refer to discrete solid orsemi-solid surfaces to which a plurality of stochastic barcodes may beattached. A solid support may encompass any type of solid, porous, orhollow sphere, ball, bearing, cylinder, or other similar configurationcomposed of plastic, ceramic, metal, or polymeric material (e.g.,hydrogel) onto which a nucleic acid may be immobilized (e.g., covalentlyor non-covalently). A solid support may comprise a discrete particlethat may be spherical (e.g., microspheres) or have a non-spherical orirregular shape, such as cubic, cuboid, pyramidal, cylindrical, conical,oblong, or disc-shaped, and the like. A plurality of solid supportsspaced in an array may not comprise a substrate. A solid support may beused interchangeably with the term “bead.” As used herein, “solidsupport” and “substrate” can be used interchangeably.

As used herein, the term “stochastic barcode” refers to a polynucleotidesequence comprising labels of the present disclosure. A stochasticbarcode can be a polynucleotide sequence that can be used for stochasticbarcoding. Stochastic barcodes can be used to quantify targets within asample. Stochastic barcodes can be used to control for errors which mayoccur after a label is associated with a target. For example, astochastic barcode can be used to assess amplification or sequencingerrors. A stochastic barcode associated with a target can be called astochastic barcode-target or stochastic barcode-tag-target.

As used herein, the term “stochastic barcoding” refers to the randomlabeling (e.g., barcoding) of nucleic acids. Stochastic barcoding canutilize a recursive Poisson strategy to associate and quantify labelsassociated with targets. As used herein, the term “stochastic barcoding”can be used interchangeably with “stochastic labeling.”

As used here, the term “target” can refer to a composition which can beassociated with a stochastic barcode. Exemplary suitable targets foranalysis by the disclosed methods, devices, and systems includeoligonucleotides, DNA, RNA, mRNA, microRNA, tRNA, and the like. Targetscan be single or double stranded. In some embodiments targets can beproteins, polypeptides or peptides. In some embodiments targets arelipids. As used herein, “target” can be used interchangeably with“species”.

The term “reverse transcriptases” can refer to a group of enzymes havingreverse transcriptase activity (i.e., that catalyze synthesis of DNAfrom an RNA template). In general, such enzymes include, but are notlimited to, retroviral reverse transcriptase, retrotransposon reversetranscriptase, retroplasmid reverse transcriptases, retron reversetranscriptases, bacterial reverse transcriptases, group IIintron-derived reverse transcriptase, and mutants, variants orderivatives thereof. Non-retroviral reverse transcriptases includenon-LTR retrotransposon reverse transcriptases, retroplasmid reversetranscriptases, retron reverse transciptases, and group II intronreverse transcriptases. Examples of group II intron reversetranscriptases include the Lactococc s lactis Ll.LtrB intron reversetranscriptase, the Thermosynechococcus elongatus TeI4c intron reversetranscriptase, or the Geobacillus stearothermophilus GsI-IIC intronreverse transcriptase. Other classes of reverse transcriptases caninclude many classes of non-retroviral reverse transcriptases (i.e.,retrons, group II introns, and diversity-generating retroelements amongothers).

Methods of Selective Amplification and/or Extension

Some embodiments disclosed herein provide methods of selectiveamplification and/or extension of a plurality of nucleic acid targetmolecules in a sample. The methods and compositions can, for example,reduce the amplification and/or extension of undesirable nucleic acidspecies in the sample, allow selective removal of undesirable nucleicacid species of the sample, or both.

For example, a sample can comprise a plurality of nucleic acid targetmolecules, and one or more undesirable nucleic acid species. In someembodiments, the method can significantly reduce the amplification, theextension, or both of the one or more undesirable nucleic acid speciesas compared to the plurality of nucleic acid target molecules in thesample. For example, the amplification and/or extension of the one ormore undesirable nucleic acid species can be reduced by at least 10%, atleast 20%, at least 30%, at least 40%, at least 50%, at least 60%, atleast 70%, at least 80%, at least 90%, at least 95%, at least 99% ormore in comparison to the amplification and/or extension of at least oneof the nucleic acid target molecules in the sample, or the averageamplification and/or extension of the one or more of the nucleic acidtarget molecules in the sample. In some embodiments, the methodsdisclosed herein can significantly reduce the amplification and/orextension of the one or more undesirable nucleic acid species ascompared to the nucleic acid target molecules without significantlyaffecting the amplification and/or extension of the nucleic acid targetmolecules in the sample.

As used herein, a “nucleic acid species” refers to polynucleotides (forexample, single-stranded polynucleotides) that are the same orsubstantially the same in sequence, or complement of one another, or arecapable of hybridize to one another, or are transcripts from the samegenetic locus, or encode the same protein or fragment thereof. In someembodiments, members of a nucleic acid species are at least 80%, atleast 90%, at least 95%, at least 98%, at least 99%, or 100% homologousto one another, or complement thereof. In some embodiments, members of aspecies can hybridize to one another under high stringent hybridizationconditions. In some embodiments, members of a species can hybridize toone another under moderate stringent hybridization conditions. In someembodiments, members of a species can hybridize to one another under lowstringent hybridization conditions. In some embodiments, members of aspecies are transcripts from the same genetic locus and the transcriptscan be of the same or different length. The species is, in someembodiments, genomic DNA, ribosomal RNA (rRNA), mitochondrial DNA(mtDNA), cDNA, mRNA, or a combination thereof.

In some embodiments, the methods and compositions disclosed herein canreduce the amplification and/or extension of one or more undesirablenucleic acid species in a sample. For example, the methods andcompositions disclosed herein can reduce the amplification and/orextension of at least 1, at least 2, at least 3, at least 4, at least 5,at least 10, at least 20, at least 50, at least 100, at least 200, atleast 500, at least 1,000, or more, undesirable nucleic acid species inthe sample. In some embodiments, the methods and compositions disclosedherein can reduce the amplification and/or extension by at least 10%, atleast 20%, at least 30%, at least 40%, at least 50%, at least 60%, atleast 70%, at least 80%, at least 90%, at least 95%, at least 99%, or100% of each of the one or more undesirable nucleic acid species in thesample. In some embodiments, the methods and compositions disclosedherein abolish the amplification and/or extension of each of the one ormore undesirable nucleic acid species in the sample. In someembodiments, the methods and compositions disclosed herein can reduceamplification and/or extension by at least 10%, at least 20%, at least30%, at least 40%, at least 50%, at least 60%, at least 70%, at least80%, at least 90%, at least 95%, at least 99%, or 100% of at least oneof the one or more undesirable nucleic acid species. In someembodiments, the methods and compositions disclosed herein abolishamplification and/or extension of at least one of the one or moreundesirable nucleic acid species. In some embodiments, the methods andcompositions disclosed herein reduce the amplification and/or extensionof the total of undesirable nucleic acid species.

In some embodiments, the methods and compositions disclosed herein canreduce the amplification and/or extension of one or more undesirablenucleic acid species without significantly reducing amplification and/orextension of the nucleic acid target molecules in the same sample. Forexample, in some embodiments, the methods and compositions disclosedherein can reduce the amplification and/or extension by at least 10%, atleast 20%, at least 30%, at least 40%, at least 50%, at least 60%, atleast 70%, at least 80%, at least 90%, at least 95%, or 100% for each ofthe one or more undesirable nucleic acid species without significantlyreducing amplification and/or extension of the nucleic acid targetmolecules. In some embodiments, the methods and compositions disclosedherein can reduce the amplification and/or extension by at least 10%, atleast 20%, at least 30%, at least 40%, at least 50%, at least 60%, atleast 70%, at least 80%, at least 90%, at least 95%, or 100% of thetotal of undesirable nucleic acid species without significantly reducingamplification and/or extension of the nucleic acid target molecules. Insome embodiments, the methods and compositions disclosed herein canreduce the amplification and/or extension of one or more undesirablenucleic acid species while keeping at least at least 50%, at least 60%,at least 70%, at least 80%, at least 90%, at least 95%, or 100% of theamplification and/or extension of each of the nucleic acid targetmolecules. In some embodiments, the methods and compositions disclosedherein can reduce the amplification and/or extension of one or moreundesirable nucleic acid species while keeping at least at least 50%, atleast 60%, at least 70%, at least 80%, at least 90%, at least 95%, or100% of the amplification and/or extension of at least one of thenucleic acid target molecules. In some embodiments, the methods andcompositions disclosed herein can reduce the amplification and/orextension of one or more undesirable nucleic acid species while keepingat least at least 50%, at least 60%, at least 70%, at least 80%, atleast 90%, at least 95%, or 100% of the amplification and/or extensionof the total of the nucleic acid target molecules.

As shown in FIG. 1, a sample comprises a whole transcriptomeamplification (WTA) product 100 contains an undesirable nucleic acidspecies 105 and a nucleic acid target molecule 110. During anamplification reaction 120, an LNA/PNA blocking oligonucleotide 125 thatspecifically binds to the undesirable nucleic acid species 105 (bindingshown at the 3′ end of 105 as a non-limiting example) is provided, whichreduces or inhibits the amplification of the undesirable nucleic acidspecies 105. On the other hand, a copy 130 of the nucleic acid targetmolecule 110 is synthesized. In some embodiments, the amplificationreaction 120 can be used in ending amplification step of WTA after endselection. In some embodiments, the hybridization complex between theundesirable nucleic acid species 105 and the blocking oligonucleotide125 may be removed, for example, by immobilizing to a solid support.Completion 140 of the amplification reaction results in a library 150comprising multiple copies 145 of the nucleic acid target molecule 110.

Nucleic Acid Target Molecules

In some embodiments, the methods disclosed herein comprise providing asample comprising a plurality of nucleic acid target molecules. It wouldbe appreciated by one of skill in the art that the plurality of nucleicacid target molecules can comprise a variety of nucleic acid targetmolecules. For example, the nucleic acid target molecules can compriseDNA molecules, RNA molecules, genomic DNA molecules, cDNA molecules,mRNA molecules, rRNA molecules, mtDNA, siRNA molecules, or anycombination thereof. The nucleic acid target molecule can bedouble-stranded or single-stranded. In some embodiments, the pluralityof nucleic acid target molecules can comprise polyA RNA molecules. Insome embodiments, the plurality of nucleic acid target moleculescomprise at least 100, at least 1,000, at least 10,000, at least 20,000,at least 30,000, at least 40,000, at least 50,000, at least 100,000, atleast 1,000,000, or more nucleic acid species. In some embodiments, theplurality of nucleic acid target molecules can be from a sample, such asa single cell, a tissue, or a plurality of cells. In some embodiments,the plurality of nucleic acid target molecules can be pooled from aplurality of samples, such as a plurality of single cells or samplesfrom different subjects (e.g., patients).

In some embodiments, the sample can comprise one or more undesirablenucleic acid species. As used herein, an “undesirable nucleic acidspecies” refers to a nucleic acid species that is present, e.g., in highamount, in a sample, for example the nucleic acid species representing1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 20%, 30%, 40%, 50%, 55%, 60%,65%, 70%, 75%, 80%, or more, or a range between any two of these valuesof the nucleic acid content in the sample. In some embodiments, thesample can comprise at least 1, at least 2, at least 3, at least 4, atleast 5, at least 10, at least 20, at least 50, at least 100, at least200, at least 500, at least 1,000, or more, undesirable nucleic acidspecies. In some embodiments, the total of all the undesirable nucleicacid species represent at least 10%, at least 20%, at least 30%, atleast 40%, at least 50%, at least 55%, at least 60%, at least 65%, atleast 70%, at least 75%, at least 80%, or more of the nucleic acidcontent in the sample. In some embodiments, undesirable nucleic acidspecies can comprise polynucleotides encoding one or more ribosomalproteins. In some embodiments, undesirable nucleic acid species compriserRNA. In some embodiments, undesirable nucleic acid species can comprisepolynucleotides encoding one or more mitochondrial proteins. In someembodiments, undesirable nucleic acid species comprise mtDNA. In someembodiments, undesirable nucleic acid species can comprisepolynucleotides encoding one or more housekeeping proteins. In someembodiments, undesirable nucleic acid species can comprise mRNA, rRNA,mtRNA, genomic DNA, intronic sequence, high abundance sequence, and anycombination thereof.

In some embodiments, the plurality of nucleic acid target moleculescomprises an unnormalized nucleic acid library, a partially normalizednucleic acid library, or a nucleic acid library that has been normalizedby other methods, such as a cDNA library, a genomic DNA library, or thelike. In some embodiments, the plurality of nucleic acid targetmolecules can comprise a pooled unnormalized nucleic acid library, suchas a pooled unnormalized nucleic acid library constructed from aplurality of unnormalized nucleic acid libraries each representing asingle cell. In some embodiments, the unnormalized nucleic acid libraryis a cDNA library. In some embodiments, the unnormalized nucleic acidlibrary is a genomic library. In some embodiments, the unnormalizednucleic acid library is a single-cell nucleic acid library.

Blocking Oligonucleotides

In some embodiments, the methods disclosed herein comprise providing ablocking oligonucleotide that specifically binds to at least one of theone or more undesirable nucleic acid species. The blockingoligonucleotides can be provided at any point during the methodsdisclosed herein so that they can reduce the amplification and/orextension of the undesirable nucleic acid species. For example, theblocking oligonucleotides can be provided before, during or after theextension step, before or during the amplification step, before, duringor after providing a plurality of oligonucleotide steps, before, duringor after contacting the plurality of oligonucleotide probes with theplurality of nucleic acid target molecules for hybridization, or anycombination thereof. A “blocking oligonucleotide” as used herein refersto a nucleic acid molecule that can specifically bind to at least one ofthe one or more undesirable nucleic acid species, whereby thespecifically binding between the blocking oligonucleotide and the one ormore undesirable nucleic acid species can reduce the amplification orextension (e.g., reverse transcription) of the one or more undesirablenucleic acid species. For example, the blocking oligonucleotide cancomprise a nucleic acid sequence capable of hybridizing with one or moreundesirable nucleic acid species. In some embodiments, a plurality ofblocking oligonucleotides can be provided. The plurality of blockingoligonucleotides can specifically bind to at least 1, at least 2, atleast 5, at least 10, at least 100, at least 1,000 or more of the one ormore undesirable nucleic acid species. The location at which a blockingoligonucleotide specifically binds to an undesirable nucleic acidspecies can vary. For example, blocking oligonucleotide can specificallybinds to a sequence close to the 5′ end of the undesirable nucleic acidspecies. In some embodiments, the blocking oligonucleotide canspecifically bind to within 10 nt, 20 nt, 30 nt, 40 nt, 50 nt, 100 nt,200 nt, 300 nt, 400 nt, 500 nt, or 1,000 nt of the 5′ end of at leastone of the one or more undesirable nucleic acid species. In someembodiments, blocking oligonucleotide can specifically binds to asequence close to the 3′ end of the undesirable nucleic acid species.For example, the blocking oligonucleotide can specifically bind towithin 10 nt, 20 nt, 30 nt, 40 nt, 50 nt, 100 nt, 200 nt, 300 nt, 400nt, 500 nt, 1,000 nt of the 3′ end of at least one of the one or moreundesirable nucleic acid species. As another example, blockingoligonucleotide can specifically binds to a sequence in the middleportion of the undesirable nucleic acid species. In some embodiments,the blocking oligonucleotide can specifically bind to within 10 nt, 20nt, 30 nt, 40 nt, 50 nt, 100 nt, 200 nt, 300 nt, 400 nt, 500 nt, 1,000nt from the middle point of at least one of the one or more undesirablenucleic acid species.

In some embodiments, the specifically binding between the blockingoligonucleotide and the undesirable nucleic acid species can reduce theamplification and/or extension of the undesirable nucleic acid speciesby at least 10%, at least 20%, at least 30%, at least 40%, at least 50%,at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, atleast 98%, at least 99%, or more.

It is contemplated that the blocking oligonucleotide may reduce theamplification and/or extension of the undesirable nucleic acid speciesby, for example, forming a hybridization complex with the undesirablenucleic acid species having a high melting temperature (T_(m)), by notbeing able to function as a primer for a reverse transcriptase or apolymerase, a combination thereof, etc. In some embodiments, theblocking oligonucleotide can have a T_(m) that is, is about, is atleast, 50° C., 60° C., 65° C., 70° C., 75° C., 80° C., 85° C., 90° C.,95° C., or a range between any two of these values. In some embodiments,the blocking oligonucleotide can reduce the amplification and/orextension of the undesirable nucleic acid species by competing with theamplification and/or extension primers for hybridization with theundesirable nucleic acid species.

The blocking oligonucleotide can, in some embodiments, comprise one ormore non-natural nucleotides. Non-natural nucleotides can be, forexample, photolabile or triggerable nucleotides. Examples of non-naturalnucleotides can include, but are not limited to, peptide nucleic acid(PNA), morpholino and locked nucleic acid (LNA), as well as glycolnucleic acid (GNA) and threose nucleic acid (TNA). In some embodiments,the blocking oligonucleotide is a chimeric oligonucleotide, such as anLNA/PNA/DNA chimera, an LNA/DNA chimera, a PNA/DNA chimera, a GNA/DNAchimera, a TNA/DNA chimera, or a combination thereof.

The melting temperature (T_(m)) of a blocking oligonucleotide can bemodified, in some embodiments, by adjusting the length of the blockingoligonucleotide. For example, a blocking oligonucleotide can have alength that is, is about, is less than, is more than, 10 nt, 15 nt, 20nt, 25 nt, 30 nt, 35 nt, 40 nt, 45 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90nt, 100 nt, 200 nt, or a range between any two of the above values.

In some embodiments, the T_(m) of a blocking oligonucleotide is modifiedby the number of DNA residues in the blocking oligonucleotide thatcomprises an LNA/DNA chimera or a PNA/DNA chimera. For example, ablocking oligonucleotide that comprises an LNA/DNA chimera or a PNA/DNAchimera can have a percentage of DNA residues that is, is about, is lessthan, is more than, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%,70%, 80%, 90%, or a range between any two of the above values.

In some embodiments, a blocking oligonucleotide can be designed to beincapable of functioning as a primer or probe for an amplificationand/or extension reaction. For example, the blocking oligonucleotide maybe incapable of function as a primer for a reverse transcriptase or apolymerase. For example, a blocking oligonucleotide that comprises anLNA/DNA chimera or a PNA/DNA chimera can be designed to have a certainpercentage of LNA or PNA residues, or to have LNA or PNA residues oncertain locations, such as close to or at the 3′ end, 5′ end, or in themiddle portion of the oligonucleotide. In some embodiments, a blockingoligonucleotide that comprises an LNA/DNA chimera or a PNA/DNA chimeracan have a percentage of LNA or PNA residues that is, is about, is lessthan, is more than, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 60%,70%, 80%, 90%, or a range between any two of the above values.

In some embodiments, the methods disclosed herein can comprise removingthe hybridized complex formed between the blocking oligonucleotide andthe undesirable nucleic acid species. For example, the blockingoligonucleotides can comprise an affinity moiety. The affinity moietycan be a functional group selected from the group consisting of biotin,streptavidin, heparin, an aptamer, a click-chemistry moiety,digoxigenin, primary amine(s), carboxyl(s), hydroxyl(s), aldehyde(s),ketone(s), and any combination thereof. In some embodiments, theaffinity moiety is biotin. In some embodiments, the blockingoligonucleotide can be immobilized to a solid support having a bindingpartner for the affinity moiety through the affinity moiety. In someembodiments, the binding partner is streptavidin.

Oligonucleotide Probes

In some embodiments, the methods disclosed herein comprise providing aplurality of oligonucleotide probes, wherein each of the plurality ofoligonucleotide probes comprises a molecular label sequence and abinding region. In some embodiments, the methods disclosed hereincomprise contacting the plurality of oligonucleotide probes with theplurality of nucleic acid target molecules for hybridization. Theoligonucleotide probes can comprise a binding region that hybridizes toone or more of the plurality of nucleic acid target molecules and one ormore of the undesirable nucleic acid species. In some embodiments, thebinding region can be target specific. For example, the binding regionis configured to bind specific sequence(s). In some embodiments, thebinding region can be target nonspecific. In some embodiments, thebinding region comprises or consists of poly-dT sequence. In someembodiments, the oligonucleotide probes can comprise a stochasticbarcode. In some embodiments, the oligonucleotide probes can comprise amolecular label sequence, a cell label sequence, a sample labelsequence, a location label sequence, a binding site for a universalprimer, or any combination thereof.

It is contemplated that the methods and compositions disclosed hereincan be used in conjunction of molecular label sequences, for example,oligonucleotide probes that comprise molecular label sequences.Accordingly, in some embodiments, the species of nucleic acid moleculesas disclosed herein can include polynucleotides in the plurality ofnucleic acid molecules that are the same or the complement of oneanother, or are capable of hybridize to one another, or are transcriptsfrom the same genetic locus, or encode the same protein or fragmentthereof, etc., but that are associated with different molecular labelsequences. It would be appreciated that molecular label sequences can beused to identify occurrences of a nucleic acid species.

A molecular label sequence can comprise a nucleic acid sequence thatprovides identifying information for the specific nucleic acid. Amolecular label sequence can comprise a nucleic acid sequence thatprovides a counter for the specific occurrence of the target nucleicacid. A molecular label sequence can be, for example, 1, 2, 3, 4, 5, 10,15, 20, 25, 30, 35, 40, 45, 50, or more, or a range between any of thesevalues, nucleotides in length. A molecular label sequence can be, forexample, be at most about 300, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20,15, 12, 10, 9, 8, 7, 6, 5, 4 or fewer nucleotides in length.

It would be appreciated that in some embodiments, the methods andcompositions disclosed herein may reduce amplification of undesirablenucleic acid species without significantly reducing the number ofdifferent molecular label sequences associated with the other nucleicacid target molecules. For example, the methods and compositionsdisclosed herein can reduce amplification of undesirable nucleic acidspecies while retaining at least at least 10%, at least 20%, at least30%, at least 40%, at least 50%, at least 60%, at least 70%, at least80%, at least 90%, at least 95%, or 100% of the different molecularlabel sequences associated with the other nucleic acid target molecules.In some embodiments, the methods and compositions disclosed herein canreduce amplification of undesirable nucleic acid species by at least atleast 10%, at least 20%, at least 30%, at least 40%, at least 50%, atleast 60%, at least 70%, at least 80%, at least 90%, at least 95%, or atleast 99% while retaining at least at least 10%, at least 20%, at least30%, at least 40%, at least 50%, at least 60%, at least 70%, at least80%, at least 90%, at least 95%, or 100% of the different molecularlabel sequences associated with the other nucleic acid target molecules.In some embodiments, reducing amplification of undesirable nucleic acidspecies does not significantly reduce the number of different molecularlabel sequences associated with the other nucleic acid target molecules.

Extension

One or more extension reactions can be performed using theoligonucleotide probes that are hybridized to the plurality of nucleicacid target molecules to generate a plurality of extension products. Insome embodiments, the oligonucleotide probes function as primers for theextension reaction, such as reverse transcription. The extensionreactions can be performed with or without the presence of blockingoligonucleotides. In embodiments where blocking oligonucleotides arepresent in the extension reactions, the blocking oligonucleotides canreduce the extension of one or more undesirable nucleic acid species towhich the blocking oligonucleotides specifically bind. In someembodiments, the blocking oligonucleotides do not significantly reducethe extension of the nucleic acid target molecules.

The plurality of nucleic acid target molecules can, in some embodiments,randomly associate with the oligonucleotide probes. Association can, forexample, comprise hybridization of an oligonucleotide probe's bindingregion to a complementary portion of the target nucleic acid molecule(e.g., oligo dT sequence of the stochastic barcode can interact with apoly-A tail of a target nucleic acid molecule). The assay conditionsused for hybridization (e.g. buffer pH, ionic strength, temperature,etc.) can be chosen to promote formation of specific, stable hybrids.

The disclosure provides for methods of associating a molecular labelwith a target nucleic acid using reverse transcription.

Amplification

In some embodiments, the methods disclosed herein can compriseamplifying a sample wherein the sample comprises a plurality of nucleicacid target molecules and one or more undesirable nucleic acid species,or amplifying the plurality of extension products to generate aplurality of amplicons. In some embodiments, one or more nucleic acidamplification reactions can be performed to create multiple copies ofthe target nucleic acid molecules or the extension products. In someembodiments, primers can be added for the amplification reaction, suchas PCR. The amplification reactions can be performed in or without thepresence of blocking oligonucleotides. In embodiments, where blockingoligonucleotides are present in the amplification reactions, theblocking oligonucleotides can reduce the amplification of one or moreundesirable nucleic acid species to which the blocking oligonucleotidesspecifically bind. In some embodiments, the blocking oligonucleotides donot significantly reduce the amplification of the nucleic acid targetmolecules.

Amplification can be performed, in some embodiments, in a multiplexedmanner, wherein multiple target nucleic acid sequences are amplifiedsimultaneously. The amplification reaction can be used, for example, toadd sequencing adaptors to the nucleic acid molecules. The amplificationreactions can comprise amplifying at least a portion of a sample label,if present. The amplification reactions can comprise amplifying at leasta portion of the cellular and/or molecular label. The amplificationreactions can comprise amplifying at least a portion of a sample tag, acell label, a spatial label, a molecular label, a target nucleic acid,or a combination thereof. The amplification reactions can, for example,comprise amplifying at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%,15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%,85%, 90%, 95%, 97%, or 100% of the plurality of target nucleic acids.The method may further comprise conducting one or more cDNA synthesisreactions to produce one or more cDNA copies of target-barcode moleculescomprising a sample label, a cell label, a spatial label, and/or amolecular label.

In some embodiments, amplification can be performed using a polymerasechain reaction (PCR). As used herein, “PCR” refers to a reaction for thein vitro amplification of specific DNA sequences by the simultaneousprimer extension of complementary strands of DNA. As used herein, PCRencompass derivative forms of the reaction, including but not limitedto, RT-PCR, real-time PCR, nested PCR, quantitative PCR, multiplexedPCR, digital PCR, and assembly PCR.

Amplification of the labeled nucleic acids can comprise non-PCR basedmethods. Examples of non-PCR based methods include, but are not limitedto, multiple displacement amplification (MDA), transcription-mediatedamplification (TMA), whole transcriptome amplification (WTA), wholegenome amplification (WGA), nucleic acid sequence-based amplification(NASBA), strand displacement amplification (SDA), real-time SDA, rollingcircle amplification, or circle-to-circle amplification. Othernon-PCR-based amplification methods include multiple cycles ofDNA-dependent RNA polymerase-driven RNA transcription amplification orRNA-directed DNA synthesis and transcription to amplify DNA or RNAtargets, a ligase chain reaction (LCR), and a Qβ replicase (Qβ) method,use of palindromic probes, strand displacement amplification,oligonucleotide-driven amplification using a restriction endonuclease,an amplification method in which a primer is hybridized to a nucleicacid sequence and the resulting duplex is cleaved prior to the extensionreaction and amplification, strand displacement amplification using anucleic acid polymerase lacking 5′ exonuclease activity, rolling circleamplification, and ramification extension amplification (RAM). In someinstances, the amplification may not produce circularized transcripts.

In some instances, the methods disclosed herein further compriseconducting a polymerase chain reaction on the labeled nucleic acid(e.g., labeled-RNA, labeled-DNA, labeled-cDNA) to produce a labeledamplicon. The labeled amplicon can, for example, be a double-strandedmolecule. The double-stranded molecule can comprise a double-strandedRNA molecule, a double-stranded DNA molecule, or a RNA moleculehybridized to a DNA molecule. One or both of the strands of thedouble-stranded molecule may comprise a sample label, a spatial label, acell label, and/or a molecular label. The labeled amplicon can be asingle-stranded molecule. The single-stranded molecule can comprise DNA,RNA, or a combination thereof. The nucleic acids of the disclosurecomprise synthetic or altered nucleic acids.

Amplification can, for example, comprise use of one or more non-naturalnucleotides. Non-natural nucleotides may comprise photolabile ortriggerable nucleotides. Examples of non-natural nucleotides caninclude, but are not limited to, peptide nucleic acid (PNA), morpholinoand locked nucleic acid (LNA), as well as glycol nucleic acid (GNA) andthreose nucleic acid (TNA). Non-natural nucleotides may be added to oneor more cycles of an amplification reaction. The addition of thenon-natural nucleotides can be, for example, used to identify productsas specific cycles or time points in the amplification reaction.

As described herein, conducting the one or more amplification reactionscan comprise the use of one or more primers. A primer can, for example,comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15or more nucleotides. In some embodiments, the primer comprises at least1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 or morenucleotides. For example, the primer can comprise 12 to 15 nucleotides.The one or more primers can, for example, anneal to at least a portionof the plurality of labeled target nucleic acid molecules andoligonucleotides. For example, the one or more primers can anneal to the3′ end or 5′ end of the plurality of labeled target nucleic acidmolecules and oligonucleotides. The one or more primers can, in someembodiments, anneal to an internal region of the plurality of labeledtarget nucleic acid molecules and oligonucleotides. The internal regionof a oligonucleotide or target nucleic acid molecule can be, forexample, at least about 50, 100, 150, 200, 220, 230, 240, 250, 260, 270,280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410,420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550,560, 570, 580, 590, 600, 650, 700, 750, 800, 850, 900, or 1000nucleotides from the 3′ ends and/or 5′ end of the oligonucleotide or thetarget nucleic acid molecule. The one or more primers may comprise afixed panel of primers. The one or more primers may comprise at leastone or more custom primers. The one or more primers may comprise atleast one or more control primers. The one or more primers may compriseat least one or more gene-specific primers.

The one or more primers can comprise any universal primer of thedisclosure. The universal primer may anneal to a universal primerbinding site. The one or more custom primers can, in some embodiments,anneal to a first sample label, a second sample label, a spatial label,a cell label, a molecular label, a target, or any combination thereof.The one or more primers may comprise a universal primer and a customprimer.

Any amplification scheme can be used in the methods of the presentdisclosure. For example, in one scheme, the first round PCR can amplifymolecules (e.g., attached to the bead) using a gene specific primer anda primer against the universal Illumina sequencing primer 1 sequence.The second round of PCR can amplify the first PCR products using anested gene specific primer flanked by Illumina sequencing primer 2sequence, and a primer against the universal Illumina sequencing primer1 sequence. The third round of PCR adds P5 and P7 and sample index toturn PCR products into an Illumina sequencing library. Sequencing using150 bp×2 sequencing can reveal the cell label and molecular index onread 1, the gene on read 2, and the sample index on index 1 read.

Amplification can be performed in one or more rounds. In some instancesthere are multiple rounds of amplification. Amplification can comprisetwo or more rounds of amplification. The first amplification can be anextension off X′ to generate the gene specific region. The secondamplification can occur when a sample nucleic hybridizes to the newlygenerated strand.

In some embodiments, hybridization does not need to occur at the end ofa nucleic acid molecule. In some embodiments, a target nucleic acidwithin an intact strand of a longer nucleic acid is hybridized andamplified. For example a target within a longer section of genomic DNAor mRNA. A target can be more than 50 nt, more than 100 nt, or more that1000 nt from one end (e.g., 5′ end or 3′ end) of a polynucleotide.

Sequencing

In some embodiments, the extension products and/or the amplificationproducts disclosed herein may be used for sequencing. Any suitablesequencing method known in the art can be used, preferablyhigh-throughput approaches. For example, cyclic array sequencing usingplatforms such as Roche 454, Illumina Solexa, ABI-SOLiD, ION Torrent,Complete Genomics, Pacific Bioscience, Helicos, or the Polonatorplatform, may also be utilized. Sequencing may comprise MiSeq sequencingand/or HiSeq sequencing. The selective extension and/or amplificationmethods disclosed herein can, in some embodiments, increase theefficiency of sequencing by decreasing the number of sequencing readsfor the undesirable nucleic acid species.

In some embodiments, after using the selective extension and/oramplification methods described herein, the sequencing reads for theundesirable nucleic acid species are less than 50%, less than 40%, lessthan 30%, less than 20%, less than 10%, less than 5%, or less, of thetotal sequencing reads. In some embodiments, the sequencing reads forthe undesirable nucleic acid species are less than 40% of the totalsequencing reads. In some embodiments, the sequencing reads for theundesirable nucleic acid species are less than 30% of the totalsequencing reads. In some embodiments, the sequencing reads for theundesirable nucleic acid species are less than 20% of the totalsequencing reads. In some embodiments, the sequencing reads for theundesirable nucleic acid species are less than 10% of the totalsequencing reads. In some embodiments, after using the selectiveextension and/or amplification methods described herein, the sequencingreads for the undesirable nucleic acid species are reduced to less than60%, less than 50%, less than 40%, less than 30%, less than 20%, lessthan 10%, less than 5% of the sequencing reads for the undesirablenucleic acid without using the selective extension and/or amplificationmethods described herein. In some embodiments, after using the selectiveextension and/or amplification methods described herein, the sequencingreads for the undesirable nucleic acid species are reduced to, or toabout, 60%, 50%, 40%, 30%, 20%, 10%, 5%, 2%, 1%, 0.5%, or a rangebetween any two of these values, of the sequencing reads for theundesirable nucleic acid without using the selective extension and/oramplification methods described herein.

In some embodiments, the methods and compositions disclosed herein canimprove sequencing efficiency by decreasing the sequencingreads:molecular label ratio of an undesirable nucleic acid speciesand/or increasing the sequencing reads:molecular label ratio of anucleic acid target molecule. For example, the ratio of sequencing readsto molecular label for an undesirable nucleic acid species can be lessthan 20, less than 15, less than 10, less than 9, less than 8, less than7, less than 6, less than 5, less than 4, less than 3, less than 2, orless than 1. In some embodiments, the ratio of sequencing reads tomolecular label for an undesirable nucleic acid species is 20, 15, 10,9, 8, 7, 6, 5, 4, 3, 2, 1, or a range between any two of these values.

Kits

Disclosed herein are kits for selective amplification and/or extensionof a plurality of nucleic acid target molecules in a sample, wherein thesample comprises a plurality of target nucleic acid species and one ormore undesirable nucleic acid species. In some embodiments, the kitcomprises a plurality of oligonucleotide probes, wherein each of theplurality of oligonucleotide probes comprises a molecular label sequenceand a binding region; and a plurality of blocking oligonucleotides thatspecifically binds to a plurality of undesirable nucleic acid species inthe sample, wherein each blocking oligonucleotide probe is unable tofunction as a primer for a reverse transcriptase or a polymerase.

In some embodiments, the kit further comprises a plurality of blockingoligonucleotides. The plurality of blocking oligonucleotides can, forexample, specifically bind to at least 1, at least 2, at least 5, atleast 10, at least 100, at least 1,000 or more undesirable nucleic acidspecies in the sample. In some embodiments, the blocking oligonucleotidecan specifically bind to within 10 nt, 20 nt, 30 nt, 40 nt, 50 nt, 100nt, 200 nt, 300 nt, 400 nt, 500 nt, 1,000 nt of the 5′ end of the one ormore undesirable nucleic acid species. In some embodiments, the blockingoligonucleotide can specifically bind to within 10 nt, 20 nt, 30 nt, 40nt, 50 nt, 100 nt, 200 nt, 300 nt, 400 nt, 500 nt, 1,000 nt of the 3′end of the one or more undesirable nucleic acid species. In someembodiments, the blocking oligonucleotide can specifically bind towithin 10 nt, 20 nt, 30 nt, 40 nt, 50 nt, 100 nt, 200 nt, 300 nt, 400nt, 500 nt, 1,000 nt surrounding the middle point of the one or moreundesirable nucleic acid species.

It is contemplated that the blocking oligonucleotide may reduce theamplification and/or extension of the undesirable nucleic acid speciesby forming a hybridization complex with the undesirable nucleic acidspecies having a high melting temperature (T_(m)), by not being able tofunction as a primer for a reverse transcriptase or a polymerase, acombination thereof, etc. In some embodiments, the blockingoligonucleotide can have a T_(m) that is, is about, is at least, 50° C.,60° C., 65° C., 70° C., 75° C., 80° C., 85° C., 90° C., 95° C., or arange between any two of the above values. In some embodiments, theblocking oligonucleotide may reduce the amplification and/or extensionof the undesirable nucleic acid species by competing with theamplification and/or extension primers for hybridization with theundesirable nucleic acid species.

The blocking oligonucleotide can, in some embodiments, comprise one ormore non-natural nucleotides. Non-natural nucleotides can comprisephotolabile or triggerable nucleotides. Examples of non-naturalnucleotides can include, but are not limited to, peptide nucleic acid(PNA), morpholino and locked nucleic acid (LNA), as well as glycolnucleic acid (GNA) and threose nucleic acid (TNA). In some embodiments,the blocking oligonucleotide is a chimeric oligonucleotide, such as anLNA/PNA/DNA chimera, an LNA/DNA chimera, a PNA/DNA chimera, a GNA/DNAchimera, a TNA/DNA chimera, and a combination thereof.

It would be appreciated that the T_(m) of a blocking oligonucleotide canbe modified by adjusting the length of the blocking oligonucleotide. Forexample, a blocking oligonucleotide can have a length that is, is about,is less than, is more than, 10 nt, 15 nt, 20 nt, 25 nt, 30 nt, 35 nt, 40nt, 45 nt, 50 nt, 60 nt, 70 nt, 80 nt, 90 nt, 100 nt, 200 nt, or a rangebetween any two of the above values.

In some embodiments, the T_(m) of a blocking oligonucleotide can bemodified by adjusting the number of DNA residues in the blockingoligonucleotide that comprises an LNA/DNA chimera or a PNA/DNA chimera.For example, a blocking oligonucleotide that comprises an LNA/DNAchimera or a PNA/DNA chimera can have a percentage of DNA residues thatis, is about, is less than, is more than, 10%, 15%, 20%, 25%, 30%, 35%,40%, 45%, 50%, 60%, 70%, 80%, 90%, or a range between any two of theabove values.

In some embodiments, a blocking oligonucleotide can be designed to beingincapable of functioning as a primer for an extension or amplication.For example, the blocking oligonucleotide may be incapable offunctioning as a primer for a reverse transcriptase, a polymerase, orboth. For example, a blocking oligonucleotide that comprises an LNA/DNAchimera or a PNA/DNA chimera can be designed to have a certainpercentage of LNA or PNA residues, or to have LNA or PNA residues atcertain location(s), such as the 3′ end, the 5′ end, the internalregion, or a combination thereof of the blocking oligonucleotide. Insome embodiments, a blocking oligonucleotide that comprises an LNA/DNAchimera or a PNA/DNA chimera can have a percentage of LNA or PNAresidues that is, is about, is less than, is more than, 10%, 15%, 20%,25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, 90%, or a range between anytwo of the above values.

In some embodiments, the blocking oligonucleotides can comprise anaffinity moiety. The affinity moiety can be a functional group selectedfrom the group consisting of biotin, streptavidin, heparin, an aptamer,a click-chemistry moiety, digoxigenin, primary amine(s), carboxyl(s),hydroxyl(s), aldehyde(s), ketone(s), and any combination thereof. Insome embodiments, the blocking oligonucleotides can be immobilized to asolid support having a binding partner for the affinity moiety throughthe affinity moiety.

In some embodiments, each of the oligonucleotide probes can comprise amolecular label, a cell label, a sample label, or any combinationthereof. In some embodiments, each of the oligonucleotides can comprisea linker. In some embodiments, each of the oligonucleotide probes cancomprise a binding site for an oligonucleotide probe, such as a poly Atail. For example, the poly A tail can be, e.g., oligodA₁₈ (unanchoredto a solid support) or oligoA₁₈V (anchored to a solid support). Theoligonucleotide probes can comprise DNA residues, RNA residues, or both.

In some embodiments, the kits can further comprise an enzyme. In someembodiments, the enzyme can be a reverse transcriptase, a polymerase, aligase, a nuclease, or, any combination thereof.

Stochastic Barcodes

The oligonucleotide probes disclosed herein can comprise, or consistsof, stochastic barcodes. As disclosed herein, a stochastic barcode canbe a polynucleotide sequence that may be used to stochastically label(e.g., barcode, tag) a target. A stochastic barcode can comprise one ormore labels. Exemplary labels include, but are not limited to, universallabels, cell labels, molecular labels, sample labels, plate labels,spatial labels, pre-spatial labels, and any combination thereof. Astochastic barcode can comprise a 5′amine that may link the stochasticbarcode to a solid support. The stochastic barcode can comprise one ormore of a universal label, a dimension label, a spatial label, a celllabel, and a molecular label. The universal label can be 5′-most label.The molecular label can be the 3′-most label. The spatial label,dimension label, and the cell label can be in any order. In someinstances, the universal label, the spatial label, the dimension label,the cell label, and the molecular label are in any order. The stochasticbarcode can comprise a target-binding region. The target-binding regioncan interact with a target (e.g., target nucleic acid, RNA, mRNA, DNA)in a sample. For example, a target-binding region can comprise an oligodT sequence which can interact with poly-A tails of mRNAs. In someinstances, the labels of the stochastic barcode (e.g., universal label,dimension label, spatial label, cell label, and molecular label) may beseparated by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, a range between any two of these values, or morenucleotides.

A stochastic barcode can comprise one or more universal labels. The oneor more universal labels can be the same for all stochastic barcodes inthe set of stochastic barcodes (e.g., attached to a given solidsupport). In some embodiments, the one or more universal labels can bethe same for all stochastic barcodes attached to a plurality of beads.In some embodiments, a universal label comprises a nucleic acid sequencethat is capable of hybridizing to a sequencing primer. Sequencingprimers can be used for sequencing stochastic barcodes comprising auniversal label. Sequencing primers (e.g., universal sequencing primers)can comprise sequencing primers associated with high-throughputsequencing platforms. In some embodiments, a universal label maycomprise a nucleic acid sequence that is capable of hybridizing to a PCRprimer. In some embodiments, the universal label comprises a nucleicacid sequence that is capable of hybridizing to a sequencing primer anda PCR primer. The nucleic acid sequence of the universal label that iscapable of hybridizing to a sequencing or PCR primer may be referred toas a primer binding site. A universal label can comprise a sequence thatmay be used to initiate transcription of the stochastic barcode. Auniversal label can comprise a sequence that may be used for extensionof the stochastic barcode or a region within the stochastic barcode. Auniversal label can be, or be at least about, 1, 2, 3, 4, 5, 10, 15, 20,25, 30, 35, 40, 45, 50 or more nucleotides in length. A universal labelcan comprise at least about 10 nucleotides. A universal label can be atmost about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or morenucleotides in length. In some embodiments, a cleavable linker ormodified nucleotide is part of the universal label sequence to enablethe stochastic barcode to be cleaved off from the support. As usedherein, a universal label can be used interchangeably with “universalPCR primer.”

A stochastic barcode can comprise a dimension label. A dimension labelcan comprise a nucleic acid sequence that provides information about adimension in which the stochastic labeling occurred. For example, adimension label can provide information about the time at which a targetwas stochastically barcoded. A dimension label can be associated with atime of stochastic barcoding in a sample. A dimension label canactivated at the time of stochastic labeling. Different dimension labelscan be activated at different times. The dimension label providesinformation about the order in which targets, groups of targets, and/orsamples were stochastically barcoded. For example, a population of cellscan be stochastically barcoded at the G0 phase of the cell cycle. Thecells can be pulsed again with stochastic barcodes at the G1 phase ofthe cell cycle. The cells can be pulsed again with stochastic barcodesat the S phase of the cell cycle, and so on. Stochastic barcodes at eachpulse (e.g., each phase of the cell cycle), can comprise differentdimension labels. In this way, the dimension label provides informationabout which targets were labelled at which phase of the cell cycle.Dimension labels can interrogate many different biological times.Exemplary biological times can include, but are not limited to, the cellcycle, transcription (e.g., transcription initiation), and transcriptdegradation. In another example, a sample (e.g., a cell, a population ofcells) can be stochastically labeled before and/or after treatment witha drug and/or therapy. The changes in the number of copies of distincttargets can be indicative of the sample's response to the drug and/ortherapy.

In some embodiments, a dimension label is activatable. An activatabledimension label can be activated, for example, at a specific timepoint.The activatable dimension label can be constitutively activated (e.g.,not turned off). The activatable dimension label can be reversiblyactivated (e.g., the activatable dimension label can be turned on andturned off). The dimension label can be reversibly activatable at least1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more times. The dimension label canbe reversibly activatable 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or moretimes. For example, the dimension label can be activated withfluorescence, light, a chemical event (e.g., cleavage, ligation ofanother molecule, addition of modifications (e.g., pegylated,sumoylated, acetylated, methylated, deacetylated, demethylated), aphotochemical event (e.g., photocaging), and introduction of anon-natural nucleotide.

The dimension label can be identical for all stochastic barcodesattached to a given solid support (e.g., bead), but different fordifferent solid supports (e.g., beads). In some embodiments, at least60%, 70%, 80%, 85%, 90%, 95%, 97%, 99% or 100% of stochastic barcodes onthe same solid support comprise the same dimension label. In someembodiments, at least 60% of stochastic barcodes on the same solidsupport comprise the same dimension label. In some embodiments, at least95% of stochastic barcodes on the same solid support comprise the samedimension label.

There can be as many as 10⁶ or more unique dimension label sequencesrepresented in a plurality of solid supports (e.g., beads). A dimensionlabel can, for example, be or be at least about 1, 2, 3, 4, 5, 10, 15,20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. A dimensionlabel can be at most about 300, 200, 100, 90, 80, 70, 60, 50, 40, 30,20, 15, 12, 10, 9, 8, 7, 6, 5, 4 or fewer or more nucleotides in length.A dimension label can, for example, is about 5 to about 200 nucleotides,or about 10 to about 150 nucleotides in length. In some embodiments, adimension label is from about 20 to about 125 nucleotides in length.

A stochastic barcode can comprise a spatial label. A spatial label cancomprise a nucleic acid sequence that provides information about thespatial orientation of a target molecule which is associated with thestochastic barcode. A spatial label can be associated with a coordinatein a sample. The coordinate can be a fixed coordinate. For example acoordinate can be fixed in reference to a substrate. A spatial label canbe in reference to a two or three-dimensional grid. A coordinate can befixed in reference to a landmark. The landmark can be identifiable inspace. A landmark can be a structure which can be imaged. A landmark canbe a biological structure, for example an anatomical landmark. Alandmark can be a cellular landmark, for instance an organelle. Alandmark can be a non-natural landmark such as a structure with anidentifiable identifier such as a color code, bar code, magneticproperty, fluorescents, radioactivity, or a unique size or shape. Aspatial label can be associated with a physical partition (e.g. a well,a container, or a droplet). In some instances, multiple spatial labelsare used together to encode one or more positions in space.

The spatial label can be identical for all stochastic barcodes attachedto a given solid support (e.g., bead), but different for different solidsupports (e.g., beads). In some embodiments, at least 60%, 70%, 80%,85%, 90%, 95%, 97%, 99% or 100% of stochastic barcodes on the same solidsupport comprise the same spatial label. In some embodiments, at least60% of stochastic barcodes on the same solid support comprise the samespatial label. In some embodiments, at least 95% of stochastic barcodeson the same solid support comprise the same spatial label.

There can be as many as 10⁶ or more unique spatial label sequencesrepresented in a plurality of solid supports (e.g., beads). A spatiallabel can be, or be at least about, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30,35, 40, 45, 50 or more nucleotides in length. In some embodiments, aspatial label is most about 300, 200, 100, 90, 80, 70, 60, 50, 40, 30,20, 15, 12, 10, 9, 8, 7, 6, 5, 4 nucleotides in length. A spatial labelcan be, for example, from about 5 to about 200 nucleotides in length. Aspatial label can be, for example, from about 10 to about 150nucleotides in length. A spatial label can be from about 20 to about 125nucleotides in length.

Stochastic barcodes can comprise a cell label. A cell label can comprisea nucleic acid sequence that provides information for determining whichtarget nucleic acid originated from which cell. In some embodiments, thecell label is identical for all stochastic barcodes attached to a givensolid support (e.g., bead), but different for different solid supports(e.g., beads). In some embodiments, at least 60%, 70%, 80%, 85%, 90%,95%, 97%, 99% or 100% of stochastic barcodes on the same solid supportcomprise the same cell label. In some embodiments, at least 60% ofstochastic barcodes on the same solid support comprise the same celllabel. In some embodiment, at least 95% of stochastic barcodes on thesame solid support comprise the same cell label.

There can be as many as 10⁶ or more unique cell label sequencesrepresented in a plurality of solid supports (e.g., beads). A cell labelmay be at least about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50or more nucleotides in length. A cell label can be, or be at most about,300, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20, 15, 12, 10, 9, 8, 7, 6,5, 4 or fewer or more nucleotides in length. A cell label can be, forexample, from about 5 to about 200 nucleotides in length. A cell labelcan be, for example, from about 10 to about 150 nucleotides in length. Acell label can be, for exmaple, from about 20 to about 125 nucleotidesin length.

Stochastic barcodes can comprise a molecular label. A molecular labelcan comprise a nucleic acid sequence that provides identifyinginformation for the specific type of target nucleic acid specieshybridized to the stochastic barcode. A molecular label can comprise anucleic acid sequence that provides a counter for the specificoccurrence of the target nucleic acid species hybridized to thestochastic barcode (e.g., target-binding region). In some embodiments, adiverse set of molecular labels are attached to a given solid support(e.g., bead). In some embodiments, there can be as many as 10⁶ or moreunique molecular label sequences attached to a given solid support(e.g., bead). In some embodiments, there can be as many as 10⁵ or moreunique molecular label sequences attached to a given solid support(e.g., bead). In some embodiments, there can be as many as 10⁴ or moreunique molecular label sequences attached to a given solid support(e.g., bead). In some embodiments, there can be as many as 10³ or moreunique molecular label sequences attached to a given solid support(e.g., bead). In some embodiments, there can be as many as 10² or moreunique molecular label sequences attached to a given solid support(e.g., bead). A molecular label can be at least about 1, 2, 3, 4, 5, 10,15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. Amolecular label can be at most about 300, 200, 100, 90, 80, 70, 60, 50,40, 30, 20, 15, 12, 10, 9, 8, 7, 6, 5, 4 or fewer nucleotides in length.

Stochastic barcodes can comprise a target binding region. In someembodiments, the target binding regions comprise a nucleic acid sequencethat hybridizes specifically to a target (e.g., target nucleic acid,target molecule, e.g., a cellular nucleic acid to be analyzed), forexample to a specific gene sequence. In some embodiments, a targetbinding region comprise a nucleic acid sequence that may attach (e.g.,hybridize) to a specific location of a specific target nucleic acid. Insome embodiments, the target binding region comprise a nucleic acidsequence that is capable of specific hybridization to a restriction siteoverhang (e.g. an EcoRI sticky-end overhang). The stochastic barcode maythen ligate to any nucleic acid molecule comprising a sequencecomplementary to the restriction site overhang.

A stochastic barcode can comprise a target-binding region. Atarget-binding region can hybridize with a target of interest. Forexample, a target-binding region can comprise an oligo dT which canhybridize with mRNAs comprising poly-adenylated ends. A target-bindingregion can be gene-specific. For example, a target-binding region can beconfigured to hybridize to a specific region of a target. Atarget-binding region can be, or be at least, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 27,28, 29, or 30 or more nucleotides in length. A target-binding region canbe at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26 27, 28, 29, or 30 or more nucleotidesin length. A target-binding region can be from 5-30 nucleotides inlength. When a stochastic barcode comprises a gene-specifictarget-binding region, the stochastic barcode can be referred to as agene-specific stochastic barcode.

A target binding region can comprise a non-specific target nucleic acidsequence. A non-specific target nucleic acid sequence can refer to asequence that may bind to multiple target nucleic acids, independent ofthe specific sequence of the target nucleic acid. For example, targetbinding region can comprise a random multimer sequence, or an oligo-dTsequence that hybridizes to the poly-A tail on mRNA molecules. A randommultimer sequence can be, for example, a random dimer, trimer,quatramer, pentamer, hexamer, septamer, octamer, nonamer, decamer, orhigher multimer sequence of any length. In some embodiments, the targetbinding region is the same for all stochastic barcodes attached to agiven bead. In some embodiments, the target binding regions for theplurality of stochastic barcodes attached to a given bead comprise twoor more different target binding sequences. A target binding region canbe, or be at least about, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or morenucleotides in length. In some embodiments, a target binding region isat most about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotidesin length.

A stochastic barcode can comprise an orientation property which can beused to orient (e.g., align) the stochastic barcodes. A stochasticbarcode can comprise a moiety for isoelectric focusing. Differentstochastic barcodes can comprise different isoelectric focusing points.When these stochastic barcodes are introduced to a sample, the samplecan undergo isoelectric focusing in order to orient the stochasticbarcodes into a known way. In this way, the orientation property can beused to develop a known map of stochastic barcodes in a sample.Exemplary orientation properties include, but are not limited to,electrophoretic mobility (e.g., based on size of the stochasticbarcode), isoelectric point, spin, conductivity, and/or self-assembly.For example, stochastic barcodes can comprise an orientation property ofself-assembly, can self-assemble into a specific orientation (e.g.,nucleic acid nanostructure) upon activation.

The cell label and/or any label of the disclosure can further comprise aunique set of nucleic acid sub-sequences of defined length, e.g. 7nucleotides each (equivalent to the number of bits used in some Hammingerror correction codes), which are designed to provide error correctioncapability. Hamming codes, like other error-correcting codes, are basedon the principle of redundancy and can be constructed by addingredundant parity bits to data that is to be transmitted over a noisymedium. Such error-correcting codes can encode sample identifiers withredundant parity bits, and “transmit” these sample identifiers ascodewords. A Hamming code can refer an arithmetic process thatidentifies unique binary codes based upon inherent redundancy that arecapable of correcting single bit errors. For example, a Hamming code canbe matched with a nucleic acid barcode in order to screen for singlenucleotide errors occurring during nucleic acid amplification. Theidentification of a single nucleotide error by using a Hamming code,thereby can allow for the correction of the nucleic acid barcode.

Hamming codes can be represented by a subset of the possible codewordsthat are chosen from the center of multidimensional spheres (i.e., forexample, hyperspheres) in a binary subspace. Single bit errors may fallwithin hyperspheres associated with a specific codeword and can thus becorrected. On the other hand, double bit errors that do not associatewith a specific codeword can be detected, but not corrected. Consider afirst hypersphere centered at coordinates (0, 0, 0) (i.e., for example,using an x-y-z coordinate system), wherein any single-bit error can becorrected by falling within a radius of 1 from the center coordinates;i.e., for example, single bit errors having the coordinates of (0, 0,0); (0, 1, 0); (0, 0, 1); (1, 0, 0), or (1, 1, 0). Likewise, a secondhypersphere may be constructed wherein single-bit errors can becorrected by falling within a radius of 1 of its center coordinates (1,1, 1) (i.e., for example, (1,1,1); (1, 0, 1); (0,1, 0); or (0, 1, 1).

In some embodiments, the length of the nucleic acid sub-sequences usedfor creating error correction codes can vary, for example, they can beat least 3 nucleotides, at least 7 nucleotides, at least 15 nucleotides,or at least 31 nucleotides in length. In some embodiments, nucleic acidsub-sequences of other lengths can be used for creating error correctioncodes.

When a stochastic barcode comprises more than one of a type of label(e.g., more than one cell label or more than one molecular label), thelabels may be interspersed with a linker label sequence. A linker labelsequence can be at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 ormore nucleotides in length. A linker label sequence can be at most about5, 10, 15, 20, 25, 30, 35, 40, 45, 50 or more nucleotides in length. Insome instances, a linker label sequence is 12 nucleotides in length. Alinker label sequence can be used to facilitate the synthesis of thestochastic barcode. The linker label can comprise an error-correcting(e.g., Hamming) code.

Solid Supports

The oligonucleotide probes (e.g., stochastic barcodes) and/or theblocking oligonucleotides disclosed herein can, in some embodiments, beattached to a solid support (e.g., bead, substrate, microwell(s),microwell arrays). As used herein, the terms “tethered”, “attached”, and“immobilized” are used interchangeably, and refer to covalent ornon-covalent means for attaching a compound (e.g., an oligonucleotide)to a solid support. Any of a variety of different solid supports may beused as solid supports for attaching pre-synthesized combinatorialbarcode reagents or for in situ solid-phase synthesis of combinatorialbarcode reagents.

The solid support can be or comprise, for example, a particle or aplurality of particles. The particles can be, for example,nanoparticles, microparticles, or the likes. In some embodiments, asolid support is, or comprises, a bead or a plurality of beads. Theparticle (e.g., the bead) can encompass any type of solid, porous, orhollow sphere, ball, bearing, cylinder, or other similar configurationcomposed of plastic, ceramic, metal, or polymeric material onto which anucleic acid may be immobilized (e.g., covalently or non-covalently).The particle (e.g., the bead) can comprise a discrete particle that maybe spherical (e.g., microspheres) or have a non-spherical or irregularshape, such as cubic, cuboid, pyramidal, cylindrical, conical, oblong,or disc-shaped, and the like. The particle (e.g., the bead) can bespherical, substantial spherical, or non-spherical in shape.

The particle (e.g., bead) can comprise a variety of materials including,but not limited to, paramagnetic materials (e.g. magnesium, molybdenum,lithium, and tantalum), superparamagnetic materials (e.g. ferrite(Fe₃O₄; magnetite) nanoparticles), ferromagnetic materials (e.g. iron,nickel, cobalt, some alloys thereof, and some rare earth metalcompounds), ceramic, plastic, glass, polystyrene, silica, methylstyrene,acrylic polymers, titanium, latex, sepharose, agarose, hydrogel,polymer, cellulose, nylon, and any combination thereof.

The diameter of the particle (e.g., the bead) can be, or be at leastabout, 5 μm, 10 μm, 20 μm, 25 μm, 30 μm, 35 μm, 40 μm, 45 μm, 50 μm, orin a range of any two of these values. The diameter of the particle(e.g., the bead) can be, for examples, at most about 5 μm, 10 μm, 20 μm,25 μm, 30 μm, 35 μm, 40 μm, 45 μm or 50 μm. The diameter of the particle(e.g., the bead) can be related to the diameter of the wells of thesubstrate. For example, the diameter of the particle (e.g., bead) canbe, or be at least, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100% longer orshorter than the diameter of the well. In some embodiments, the diameterof the particle (e.g., bead) can be at most 10, 20, 30, 40, 50, 60, 70,80, 90 or 100% longer or shorter than the diameter of the well. Thediameter of the particle (e.g., bead) can be related to the diameter ofa cell (e.g., a single cell entrapped by a well of the substrate). Thediameter of the particle (e.g., bead) can be, or be at least, 10, 20,30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, or 300% or more longeror shorter than the diameter of the cell. In some embodiments, thediameter of the particle (e.g., bead) can be at most 10, 20, 30, 40, 50,60, 70, 80, 90, 100, 150, 200, 250, or 300% or more longer or shorterthan the diameter of the cell.

A particle (e.g., bead) can be attached to and/or embedded in asubstrate. For example, the particle (e.g., bead) can be attached toand/or embedded in a gel, hydrogel, polymer and/or matrix. The spatialposition of the particle (e.g., bead) within a substrate (e.g., gel,matrix, scaffold, or polymer) can be identified, in some embodiments,using the spatial label present on the stochastic barcode on the beadwhich can serve as a location address.

Examples of the particles (e.g., beads) can include, but are not limitedto, streptavidin beads, agarose beads, magnetic beads, Dynabeads®, MACS®microbeads, antibody conjugated beads (e.g., anti-immunoglobulinmicrobead), protein A conjugated beads, protein G conjugated beads,protein A/G conjugated beads, protein L conjugated beads, oligodTconjugated beads, silica beads, silica-like beads, anti-biotinmicrobead, anti-fluorochrome microbead, and BcMag™ Carboxy-TerminatedMagnetic Beads.

A particle (e.g., bead) can be associated with (e.g. impregnated with)quantum dots or fluorescent dyes to make it fluorescent in onefluorescence optical channel or multiple optical channels. A bead can beassociated with iron oxide or chromium oxide to make it paramagnetic orferromagnetic. The particles (e.g., beads) can be identifiable. Aparticle (e.g., bead) can be imaged using a camera. A particle (e.g.,bead) can have a detectable code associated with the bead. For example,a bead can comprise an RFID tag. A bead can comprise any detectable tag(e.g., UPC code, electronic barcode, etched identifier). A particle(e.g., bead) can change size, for example due to swelling in an organicor inorganic solution. A bead can be hydrophobic or hydrophilic. Aparticle (e.g., bead) can be biocompatible.

A solid support (e.g., bead) can be visualized. The solid support cancomprise a visualizing tag (e.g., fluorescent dye). A solid support(e.g., bead) can be etched with an identifier (e.g., a number). Theidentifier can be visualized through imaging the solid supports (e.g.,beads).

A solid support (e.g., bead) can comprise an insoluble, semi-soluble, orinsoluble material. A solid support can be referred to as“functionalized” when it includes a linker, a scaffold, a buildingblock, or other reactive moiety attached thereto, whereas a solidsupport may be “nonfunctionalized” when it lack such a reactive moietyattached thereto. The solid support can be employed free in solution,such as in a microtiter well format; in a flow-through format, such asin a column; or in a dipstick.

The solid support can comprise a membrane, paper, plastic, coatedsurface, flat surface, glass, slide, chip, or any combination thereof. Asolid support can take the form of resins, gels, microspheres, or othergeometric configurations. A solid support can comprise silica chips,microparticles, nanoparticles, plates, arrays, capillaries, flatsupports such as glass fiber filters, glass surfaces, metal surfaces(steel, gold silver, aluminum, silicon and copper), glass supports,plastic supports, silicon supports, chips, filters, membranes, microwellplates, slides, plastic materials including multiwell plates ormembranes (e.g., formed of polyethylene, polypropylene, polyamide,polyvinylidenedifluoride), and/or wafers, combs, pins or needles (e.g.,arrays of pins suitable for combinatorial synthesis or analysis) orbeads in an array of pits or nanoliter wells of flat surfaces such aswafers (e.g., silicon wafers), wafers with pits with or without filterbottoms.

The solid support can comprise a polymer matrix (e.g., gel, hydrogel).The polymer matrix may be able to permeate intracellular space (e.g.,around organelles). The polymer matrix may able to be pumped throughoutthe circulatory system.

A solid support can comprise, or be, a biological molecule. For example,a solid support can be a nucleic acid, a protein, an antibody, ahistone, a cellular compartment, a lipid, a carbohydrate, and the like.Solid supports that are biological molecules can be amplified,translated, transcribed, degraded, and/or modified (e.g., pegylated,sumoylated, acetylated, methylated). A solid support that is abiological molecule can provide spatial and time information in additionto the spatial label that is attached to the biological molecule. Forexample, a biological molecule can comprise a first confirmation whenunmodified, but can change to a second confirmation when modified. Thedifferent conformations can expose stochastic barcodes of the disclosureto targets. For example, a biological molecule can comprise stochasticbarcodes that are unaccessible due to folding of the biologicalmolecule. Upon modification of the biological molecule (e.g.,acetylation), the biological molecule can change conformation to exposethe stochastic labels. The timing of the modification can provideanother time dimension to the method of stochastic barcoding of thedisclosure.

In some embodiments, the biological molecule comprising combinatorialbarcode reagents of the disclosure can be located in the cytoplasm of acell. Upon activation, the biological molecule can move to the nucleus,whereupon stochastic barcoding can take place. In this way, modificationof the biological molecule can encode additional space-time informationfor the targets identified by the stochastic barcodes.

A dimension label can provide information about space-time of abiological event (e.g., cell division). For example, a dimension labelcan be added to a first cell, where the first cell can divide generatinga second daughter cell, the second daughter cell can comprise all, someor none of the dimension labels. The dimension labels can be activatedin the original cell and the daughter cell. In this way, the dimensionlabel can provide information about time of combinatorial barcoding indistinct spaces.

Substrates

As used herein, a substrate can refer to a type of solid support. Asubstrate can refer to a solid support that can comprise combinatorialbarcode reagents of the disclosure. A substrate can comprise a pluralityof microwells. A microwell can comprise a small reaction chamber ofdefined volume. A microwell can entrap one or more cells. A microwellcan entrap only one cell. A microwell can entrap one or more solidsupports. A microwell can entrap only one solid support. In someinstances, a microwell entraps a single cell and a single solid support(e.g., bead). A microwell can comprise combinatorial barcode reagents ofthe disclosure.

The microwells of the array can be fabricated in a variety of shapes andsizes. Well geometries can include, but are not limited to, cylindrical,conical, hemispherical, rectangular, or polyhedral (e.g., threedimensional geometries comprised of several planar faces, for example,hexagonal columns, octagonal columns, inverted triangular pyramids,inverted square pyramids, inverted pentagonal pyramids, invertedhexagonal pyramids, or inverted truncated pyramids). The microwells cancomprise a shape that combines two or more of these geometries. Forexample, a microwell can be partly cylindrical, with the remainderhaving the shape of an inverted cone. A microwell can include twoside-by-side cylinders, one of larger diameter (e.g. that correspondsroughly to the diameter of the beads) than the other (e.g. thatcorresponds roughly to the diameter of the cells), that are connected bya vertical channel (that is, parallel to the cylinder axes) that extendsthe full length (depth) of the cylinders. The opening of the microwellcan be at the upper surface of the substrate. The opening of themicrowell can be at the lower surface of the substrate. The closed end(or bottom) of the microwell can be flat. The closed end (or bottom) ofthe microwell can have a curved surface (e.g., convex or concave). Theshape and/or size of the microwell can be determined based on the typesof cells or solid supports to be trapped within the microwells.

The portion of the substrate between the wells can have a topology. Forexample, the portion of the substrate between the wells can be rounded.The portion of the substrate between the wells can be pointed. Thespacing portion of the substrate between the wells can be flat. Theportion of the substrate between the wells may not be flat. In someinstances, the portion of the substrate between wells is rounded. Inother words, the portion of the substrate that does not comprise a wellcan have a curved surface. The curved surface can be fabricated suchthat the highest point (e.g., apex) of the curved surface may be at thefurthest point between the edges of two or more wells (e.g., equidistantfrom the wells). The curved surface can be fabricated such that thestart of the curved surface is at the edge of a first microwell andcreates a parabola that ends at the end of a second microwell. Thisparabola can be extended in 2 dimensions to capture microwells nearby onthe hexagonal grid of wells. The curved surface can be fabricated suchthat the surface between the wells is higher and/or curved than theplane of the opening of the well. The height of the curved surface canbe, or be at least, 0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6,6.5, or 7 or more micrometers. In some embodiments, the height of thecurved surface can be at most 0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5,5, 5.5, 6, 6.5, or 7 or more micrometers.

Microwell dimensions can be characterized in terms of the diameter anddepth of the well. As used herein, the diameter of the microwell refersto the largest circle that can be inscribed within the planarcross-section of the microwell geometry. The diameter of the microwellscan range from about 1-fold to about 10-fold the diameter of the cellsor solid supports to be trapped within the microwells. The microwelldiameter can be, or be at least, 1-fold, at least 1.5-fold, at least2-fold, at least 3-fold, at least 4-fold, at least 5-fold, or at least10-fold the diameter of the cells or solid supports to be trapped withinthe microwells. In some embodiments, the microwell diameter can be atmost 10-fold, at most 5-fold, at most 4-fold, at most 3-fold, at most2-fold, at most 1.5-fold, or at most 1-fold the diameter of the cells orsolid supports to be trapped within the microwells. The microwelldiameter can be about 2.5-fold the diameter of the cells or solidsupports to be trapped within the microwells.

The diameter of the microwells can be specified in terms of absolutedimensions. The diameter of the microwells can range from about 5 toabout 60 micrometers. The microwell diameter can be, or be at least, 5micrometers, at least 10 micrometers, at least 15 micrometers, at least20 micrometers, at least 25 micrometers, at least 30 micrometers, atleast 35 micrometers, at least 40 micrometers, at least 45 micrometers,at least 50 micrometers, or at least 60 micrometers. The microwelldiameter can be at most 60 micrometers, at most 50 micrometers, at most45 micrometers, at most 40 micrometers, at most 35 micrometers, at most30 micrometers, at most 25 micrometers, at most 20 micrometers, at most15 micrometers, at most 10 micrometers, or at most 5 micrometers. Themicrowell diameter can be about 30 micrometers.

The microwell depth may be chosen to provide efficient trapping of cellsand solid supports. The microwell depth may be chosen to provideefficient exchange of assay buffers and other reagents contained withinthe wells. The ratio of diameter to height (i.e. aspect ratio) may bechosen such that once a cell and solid support settle inside amicrowell, they will not be displaced by fluid motion above themicrowell. The dimensions of the microwell may be chosen such that themicrowell has sufficient space to accommodate a solid support and a cellof various sizes without being dislodged by fluid motion above themicrowell. The depth of the microwells can range from about 1-fold toabout 10-fold the diameter of the cells or solid supports to be trappedwithin the microwells. The microwell depth can be, or be at least,1-fold, at least 1.5-fold, at least 2-fold, at least 3-fold, at least4-fold, at least 5-fold, or at least 10-fold the diameter of the cellsor solid supports to be trapped within the microwells. The microwelldepth can be at most 10-fold, at most 5-fold, at most 4-fold, at most3-fold, at most 2-fold, at most 1.5-fold, or at most 1-fold the diameterof the cells or solid supports to be trapped within the microwells. Themicrowell depth can be about 2.5-fold the diameter of the cells or solidsupports to be trapped within the microwells.

The depth of the microwells can be specified in terms of absolutedimensions. The depth of the microwells may range from about 10 to about60 micrometers. The microwell depth can be, or be at least, 10micrometers, at least 20 micrometers, at least 25 micrometers, at least30 micrometers, at least 35 micrometers, at least 40 micrometers, atleast 50 micrometers, or at least 60 micrometers. The microwell depthcan be at most 60 micrometers, at most 50 micrometers, at most 40micrometers, at most 35 micrometers, at most 30 micrometers, at most 25micrometers, at most 20 micrometers, or at most 10 micrometers. Themicrowell depth can be about 30 micrometers.

The volume of the microwells used in the methods, devices, and systemsof the present disclosure can range from about 200 micrometers³ to about120,000 micrometers³. The microwell volume can be at least 200micrometers³, at least 500 micrometers³, at least 1,000 micrometers³, atleast 10,000 micrometers³, at least 25,000 micrometers³, at least 50,000micrometers³, at least 100,000 micrometers³, or at least 120,000micrometers³. The microwell volume can be at most 120,000 micrometers³,at most 100,000 micrometers³, at most 50,000 micrometers³, at most25,000 micrometers³, at most 10,000 micrometers³, at most 1,000micrometers³, at most 500 micrometers³, or at most 200 micrometers³. Themicrowell volume can be about 25,000 micrometers³. The microwell volumemay fall within any range bounded by any of these values (e.g. fromabout 18,000 micrometers³ to about 30,000 micrometers³).

The volume of the microwell can be, or be at least, 5, 10, 15, 20, 25,30, 35 40, 45 or 50 or more nanoliters³. The volume of the microwell canbe at most 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or more nanoliters³.The volume of liquid that can fit in the microwell can be at least 5,10, 15, 20, 25, 30, 35 40, 45 or 50 or more nanoliters³. The volume ofliquid that can fit in the microwell can be at most 5, 10, 15, 20, 25,30, 35 40, 45 or 50 or more nanoliters³. The volume of the microwell canbe, or be at least, 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or morepicoliters³. The volume of the microwell can be at most 5, 10, 15, 20,25, 30, 35 40, 45 or 50 or more picoliters³. The volume of liquid thatcan fit in the microwell can be at least 5, 10, 15, 20, 25, 30, 35 40,45 or 50 or more picoliters³. The volume of liquid that can fit in themicrowell can be at most 5, 10, 15, 20, 25, 30, 35 40, 45 or 50 or morepicoliters³.

The volumes of the microwells used in the methods, devices, and systemsof the present disclosure may be further characterized in terms of thevariation in volume from one microwell to another. The coefficient ofvariation (expressed as a percentage) for microwell volume may rangefrom about 1% to about 10%. The coefficient of variation for microwellvolume may be at least 1%, at least 2%, at least 3%, at least 4%, atleast 5%, at least 6%, at least 7%, at least 8%, at least 9%, or atleast 10%. The coefficient of variation for microwell volume may be atmost 10%, at most 9%, at most 8%, at most 7%, at most 6%, at most 5%, atmost 4%, at most 3%, at most 2%, or at most 1%. The coefficient ofvariation for microwell volume may have any value within a rangeencompassed by these values, for example between about 1.5% and about6.5%. In some embodiments, the coefficient of variation of microwellvolume may be about 2.5%.

The ratio of the volume of the microwells to the surface area of thebeads (or to the surface area of a solid support to which stochasticbarcode oligonucleotides may be attached) used in the methods, devices,and systems of the present disclosure can range from about 2.5 to about1,520 micrometers. The ratio can be at least 2.5, at least 5, at least10, at least 100, at least 500, at least 750, at least 1,000, or atleast 1,520. The ratio can be at most 1,520, at most 1,000, at most 750,at most 500, at most 100, at most 10, at most 5, or at most 2.5. Theratio can be about 67.5. The ratio of microwell volume to the surfacearea of the bead (or solid support used for immobilization) may fallwithin any range bounded by any of these values (e.g. from about 30 toabout 120).

The wells of the microwell array can be arranged in a one dimensional,two dimensional, or three-dimensional array. In some embodiments, athree dimensional array can be achieved, for example, by stacking aseries of two or more two dimensional arrays (that is, by stacking twoor more substrates comprising microwell arrays).

The pattern and spacing between microwells can be chosen to optimize theefficiency of trapping a single cell and single solid support (e.g.,bead) in each well, as well as to maximize the number of wells per unitarea of the array. The microwells may be distributed according to avariety of random or non-random patterns. For example, they may bedistributed entirely randomly across the surface of the array substrate,or they may be arranged in a square grid, rectangular grid, hexagonalgrid, or the like. In some instances, the microwells are arrangedhexagonally. The center-to-center distance (or spacing) between wellsmay vary from about 5 micrometers to about 75 micrometers. In someinstances, the spacing between microwells is about 10 micrometers. Inother embodiments, the spacing between wells is at least 5 micrometers,at least 10 micrometers, at least 15 micrometers, at least 20micrometers, at least 25 micrometers, at least 30 micrometers, at least35 micrometers, at least 40 micrometers, at least 45 micrometers, atleast 50 micrometers, at least 55 micrometers, at least 60 micrometers,at least 65 micrometers, at least 70 micrometers, or at least 75micrometers. The microwell spacing can be at most 75 micrometers, atmost 70 micrometers, at most 65 micrometers, at most 60 micrometers, atmost 55 micrometers, at most 50 micrometers, at most 45 micrometers, atmost 40 micrometers, at most 35 micrometers, at most 30 micrometers, atmost 25 micrometers, at most 20 micrometers, at most 15 micrometers, atmost 10 micrometers, at most 5 micrometers. The microwell spacing can beabout 55 micrometers. The microwell spacing may fall within any rangebounded by any of these values (e.g. from about 18 micrometers to about72 micrometers).

The microwell array may comprise surface features between the microwellsthat are designed to help guide cells and solid supports into the wellsand/or prevent them from settling on the surfaces between wells.Examples of suitable surface features can include, but are not limitedto, domed, ridged, or peaked surface features that encircle the wells orstraddle the surface between wells.

The total number of wells in the microwell array can be determined bythe pattern and spacing of the wells and the overall dimensions of thearray. The number of microwells in the array can range from about 96 toabout 5,000,000 or more. The number of microwells in the array can be atleast 96, at least 384, at least 1,536, at least 5,000, at least 10,000,at least 25,000, at least 50,000, at least 75,000, at least 100,000, atleast 500,000, at least 1,000,000, or at least 5,000,000. The number ofmicrowells in the array can be at most 5,000,000, at most 1,000,000, atmost 75,000, at most 50,000, at most 25,000, at most 10,000, at most5,000, at most 1,536, at most 384, or at most 96 wells. The number ofmicrowells in the array can be about 96, 384, and/or 1536. The number ofmicrowells can be about 150,000. The number of microwells in the arraymay fall within any range bounded by any of these values (e.g. fromabout 100 to 325,000).

Microwell arrays may be fabricated using any of a number of fabricationtechniques. Examples of fabrication methods that may be used include,but are not limited to, bulk micromachining techniques such asphotolithography and wet chemical etching, plasma etching, or deepreactive ion etching; micro-molding and micro-embossing; lasermicromachining; 3D printing or other direct write fabrication processesusing curable materials; and similar techniques.

Microwell arrays can be fabricated from any of a number of substratematerials. The choice of material can depend on the choice offabrication technique, and vice versa. Examples of suitable materialscan include, but are not limited to, silicon, fused-silica, glass,polymers (e.g. agarose, gelatin, hydrogels, polydimethylsiloxane (PDMS;elastomer), polymethylmethacrylate (PMMA), polycarbonate (PC),polypropylene (PP), polyethylene (PE), high density polyethylene (HDPE),polyimide, cyclic olefin polymers (COP), cyclic olefin copolymers (COC),polyethylene terephthalate (PET), epoxy resins, thiol-ene based resins,metals or metal films (e.g. aluminum, stainless steel, copper, nickel,chromium, and titanium), and the like. In some instances, the microwellcomprises optical adhesive. In some instances, the microwell is made outof optical adhesive. In some instances, the microwell array comprisesand/or is made out of PDMS. In some instances, the microwell is made ofplastic. A hydrophilic material can be desirable for fabrication of themicrowell arrays (e.g. to enhance wettability and minimize non-specificbinding of cells and other biological material). Hydrophobic materialsthat can be treated or coated (e.g. by oxygen plasma treatment, orgrafting of a polyethylene oxide surface layer) can also be used. Theuse of porous, hydrophilic materials for the fabrication of themicrowell array may be desirable in order to facilitate capillarywicking/venting of entrapped air bubbles in the device. The microwellarray can be fabricated from a single material. The microwell array maycomprise two or more different materials that have been bonded togetheror mechanically joined.

Microwell arrays can be fabricated using substrates of any of a varietyof sizes and shapes. For example, the shape (or footprint) of thesubstrate within which microwells are fabricated may be square,rectangular, circular, or irregular in shape. The footprint of themicrowell array substrate can be similar to that of a microtiter plate.The footprint of the microwell array substrate can be similar to that ofstandard microscope slides, e.g. about 75 mm long×25 mm wide (about 3″long×1″ wide), or about 75 mm long×50 mm wide (about 3″ long×2″ wide).The thickness of the substrate within which the microwells arefabricated may range from about 0.1 mm thick to about 10 mm thick, ormore. The thickness of the microwell array substrate may be at least 0.1mm thick, at least 0.5 mm thick, at least 1 mm thick, at least 2 mmthick, at least 3 mm thick, at least 4 mm thick, at least 5 mm thick, atleast 6 mm thick, at least 7 mm thick, at least 8 mm thick, at least 9mm thick, or at least 10 mm thick. The thickness of the microwell arraysubstrate may be at most 10 mm thick, at most 9 mm thick, at most 8 mmthick, at most 7 mm thick, at most 6 mm thick, at most 5 mm thick, atmost 4 mm thick, at most 3 mm thick, at most 2 mm thick, at most 1 mmthick, at most 0.5 mm thick, or at most 0.1 mm thick. The thickness ofthe microwell array substrate can be about 1 mm thick. The thickness ofthe microwell array substrate may be any value within these ranges, forexample, the thickness of the microwell array substrate may be betweenabout 0.2 mm and about 9.5 mm. The thickness of the microwell arraysubstrate may be uniform.

A variety of surface treatments and surface modification techniques maybe used to alter the properties of microwell array surfaces. Examplescan include, but are not limited to, oxygen plasma treatments to renderhydrophobic material surfaces more hydrophilic, the use of wet or dryetching techniques to smooth (or roughen) glass and silicon surfaces,adsorption or grafting of polyethylene oxide or other polymer layers(such as pluronic), or bovine serum albumin to substrate surfaces torender them more hydrophilic and less prone to non-specific adsorptionof biomolecules and cells, the use of silane reactions to graftchemically-reactive functional groups to otherwise inert silicon andglass surfaces, etc. Photodeprotection techniques can be used toselectively activate chemically-reactive functional groups at specificlocations in the array structure, for example, the selective addition oractivation of chemically-reactive functional groups such as primaryamines or carboxyl groups on the inner walls of the microwells may beused to covalently couple oligonucleotide probes, peptides, proteins, orother biomolecules to the walls of the microwells. The choice of surfacetreatment or surface modification utilized can depend both or either onthe type of surface property that is desired and on the type of materialfrom which the microwell array is made.

The openings of microwells can be sealed, for example, during cell lysissteps to prevent cross hybridization of target nucleic acid betweenadjacent microwells. A microwell (or array of microwells) may be sealedor capped using, for example, a flexible membrane or sheet of solidmaterial (i.e. a plate or platten) that clamps against the surface ofthe microwell array substrate, or a suitable bead, where the diameter ofthe bead is larger than the diameter of the microwell.

A seal formed using a flexible membrane or sheet of solid material cancomprise, for example, inorganic nanopore membranes (e.g., aluminumoxides), dialysis membranes, glass slides, coverslips, elastomeric films(e.g. PDMS), or hydrophilic polymer films (e.g., a polymer film coatedwith a thin film of agarose that has been hydrated with lysis buffer).

Solid supports (e.g., beads) used for capping the microwells maycomprise any of the solid supports (e.g., beads) of the disclosure. Insome instances, the solid supports are cross-linked dextran beads (e.g.,Sephadex). Cross-linked dextran can range from about 10 micrometers toabout 80 micrometers. The cross-linked dextran beads used for cappingcan be from 20 micrometers to about 50 micrometers. In some embodiments,the beads may be at least about 10, 20, 30, 40, 50, 60, 70, 80 or 90%larger than the diameter of the microwells. The beads used for cappingmay be at most about 10, 20, 30, 40, 50, 60, 70, 80 or 90% larger thanthe diameter of the microwells.

The seal or cap may allow buffer to pass into and out of the microwell,while preventing macromolecules (e.g., nucleic acids) from migrating outof the well. A macromolecule of at least about 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 12, 13, 14, 15, 16, 17, 18, 19, or 20 or more nucleotides may beblocked from migrating into or out of the microwell by the seal or cap.A macromolecule of at most about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13,14, 15, 16, 17, 18, 19, or 20 or more nucleotides may be blocked frommigrating into or out of the microwell by the seal or cap.

Solid supports (e.g., beads) may be distributed among a substrate. Solidsupports (e.g., beads) can be distributed among wells of the substrate,removed from the wells of the substrate, or otherwise transportedthrough a device comprising one or more microwell arrays by means ofcentrifugation or other non-magnetic means. A microwell of a substratecan be pre-loaded with a solid support. A microwell of a substrate canhold at least 1, 2, 3, 4, or 5, or more solid supports. A microwell of asubstrate can hold at most 1, 2, 3, 4, or 5 or more solid supports. Insome instances, a microwell of a substrate can hold one solid support.

Individual cells and beads may be compartmentalized using alternativesto microwells, for example, a single solid support and single cell couldbe confined within a single droplet in an emulsion (e.g. in a dropletdigital microfluidic system).

Cells could potentially be confined within porous beads that themselvescomprise the plurality of tethered stochastic barcodes. Individual cellsand solid supports may be compartmentalized in any type of container,microcontainer, reaction chamber, reaction vessel, or the like.

Single cell combinatorial barcoding or may be performed without the useof microwells. Single cell, combinatorial barcoding assays may beperformed without the use of any physical container. For example,combinatorial barcoding without a physical container can be performed byembedding cells and beads in close proximity to each other within apolymer layer or gel layer to create a diffusional barrier betweendifferent cell/bead pairs. In another example, combinatorial barcodingwithout a physical container can be performed in situ, in vivo, on anintact solid tissue, on an intact cell, and/or subcellularly.

Microwell arrays can be a consumable component of the assay system.Microwell arrays may be reusable. Microwell arrays can be configured foruse as a stand-alone device for performing assays manually, or they maybe configured to comprise a fixed or removable component of aninstrument system that provides for full or partial automation of theassay procedure. In some embodiments of the disclosed methods, thebead-based libraries of stochastic barcodes can be deposited in thewells of the microwell array as part of the assay procedure. In someembodiments, the beads may be pre-loaded into the wells of the microwellarray and provided to the user as part of, for example, a kit forperforming stochastic barcoding and digital counting of nucleic acidtargets.

In some embodiments, two mated microwell arrays are provided, onepre-loaded with beads which are held in place by a first magnet, and theother for use by the user in loading individual cells. Followingdistribution of cells into the second microwell array, the two arraysmay be placed face-to-face and the first magnet removed while a secondmagnet is used to draw the beads from the first array down into thecorresponding microwells of the second array, thereby ensuring that thebeads rest above the cells in the second microwell array and thusminimizing diffusional loss of target molecules following cell lysis,while maximizing efficient attachment of target molecules to thestochastic barcodes on the bead.

Microwell arrays of the disclosure can be pre-loaded with solid supports(e.g., beads). Each well of a microwell array can comprise a singlesolid support. At least 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% ofthe wells in a microwell array can be pre-loaded with a single solidsupport. At most 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100% of thewells in a microwell array can be pre-loaded with a single solidsupport. The solid support can comprise stochastic barcodes and/orcombinatorial barcodes of the disclosure. Cell labels of stochasticbarcodes on different solid supports can be different. Cell labels ofstochastic barcodes on the same solid support can be the same.

While various aspects and embodiments have been disclosed herein, otheraspects and embodiments will be apparent to those skilled in the art.The various aspects and embodiments disclosed herein are for purposes ofillustration and are not intended to be limiting, with the true scopeand spirit being indicated by the following claims.

One skilled in the art will appreciate that, for this and otherprocesses and methods disclosed herein, the functions performed in theprocesses and methods can be implemented in differing order.Furthermore, the outlined steps and operations are only provided asexamples, and some of the steps and operations can be optional, combinedinto fewer steps and operations, or expanded into additional steps andoperations without detracting from the essence of the disclosedembodiments.

With respect to the use of substantially any plural and/or singularterms herein, those having skill in the art can translate from theplural to the singular and/or from the singular to the plural as isappropriate to the context and/or application. The varioussingular/plural permutations may be expressly set forth herein for sakeof clarity.

It will be understood by those within the art that, in general, termsused herein, and especially in the appended claims (e.g., bodies of theappended claims) are generally intended as “open” terms (e.g., the term“including” should be interpreted as “including but not limited to,” theterm “having” should be interpreted as “having at least,” the term“includes” should be interpreted as “includes but is not limited to,”etc.). It will be further understood by those within the art that if aspecific number of an introduced claim recitation is intended, such anintent will be explicitly recited in the claim, and in the absence ofsuch recitation no such intent is present. For example, as an aid tounderstanding, the following appended claims may contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimrecitations. However, the use of such phrases should not be construed toimply that the introduction of a claim recitation by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim recitation to embodiments containing only one suchrecitation, even when the same claim includes the introductory phrases“one or more” or “at least one” and indefinite articles such as “a” or“an” (e.g., “a” and/or “an” should be interpreted to mean “at least one”or “one or more”); the same holds true for the use of definite articlesused to introduce claim recitations. In addition, even if a specificnumber of an introduced claim recitation is explicitly recited, thoseskilled in the art will recognize that such recitation should beinterpreted to mean at least the recited number (e.g., the barerecitation of “two recitations,” without other modifiers, means at leasttwo recitations, or two or more recitations). Furthermore, in thoseinstances where a convention analogous to “at least one of A, B, and C,etc.” is used, in general such a construction is intended in the senseone having skill in the art would understand the convention (e.g., “asystem having at least one of A, B, and C” would include but not belimited to systems that have A alone, B alone, C alone, A and Btogether, A and C together, B and C together, and/or A, B, and Ctogether, etc.). In those instances where a convention analogous to “atleast one of A, B, or C, etc.” is used, in general such a constructionis intended in the sense one having skill in the art would understandthe convention (e.g., “a system having at least one of A, B, or C” wouldinclude but not be limited to systems that have A alone, B alone, Calone, A and B together, A and C together, B and C together, and/or A,B, and C together, etc.). It will be further understood by those withinthe art that virtually any disjunctive word and/or phrase presenting twoor more alternative terms, whether in the description, claims, ordrawings, should be understood to contemplate the possibilities ofincluding one of the terms, either of the terms, or both terms. Forexample, the phrase “A or B” will be understood to include thepossibilities of “A” or “B” or “A and B.”

In addition, where features or aspects of the disclosure are describedin terms of Markush groups, those skilled in the art will recognize thatthe disclosure is also thereby described in terms of any individualmember or subgroup of members of the Markush group.

As will be understood by one skilled in the art, for any and allpurposes, such as in terms of providing a written description, allranges disclosed herein also encompass any and all possible subrangesand combinations of subranges thereof. Any listed range can be easilyrecognized as sufficiently describing and enabling the same range beingbroken down into at least equal halves, thirds, quarters, fifths,tenths, etc. As a non-limiting example, each range discussed herein canbe readily broken down into a lower third, middle third and upper third,etc. As will also be understood by one skilled in the art all languagesuch as “up to,” “at least,” and the like include the number recited andrefer to ranges which can be subsequently broken down into subranges asdiscussed above. Finally, as will be understood by one skilled in theart, a range includes each individual member. Thus, for example, a grouphaving 1-3 cells refers to groups having 1, 2, or 3 cells. Similarly, agroup having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells,and so forth.

From the foregoing, it will be appreciated that various embodiments ofthe present disclosure have been described herein for purposes ofillustration, and that various modifications may be made withoutdeparting from the scope and spirit of the present disclosure.Accordingly, the various embodiments disclosed herein are not intendedto be limiting, with the true scope and spirit being indicated by thefollowing claims.

What is claimed is:
 1. A method of selective amplification, comprising:providing a plurality of sample molecules comprising a plurality ofnucleic acid target molecules and one or more undesirable nucleic acidspecies; providing one or more amplification primers; providing ablocking oligonucleotide that specifically binds to at least one of theone or more undesirable nucleic acid species within 100 nt of the 5′ endof the one or more undesirable nucleic acid species; and amplifying theplurality of sample molecules in the presence of the blockingoligonucleotide and the one or more amplification primers to generate aplurality of amplicons, whereby the amplification of the undesirablenucleic acid species is reduced by the blocking oligonucleotide notbeing able to function as a primer for a polymerase.
 2. The method ofclaim 1, wherein the nucleic acid target molecules comprise DNAmolecules, RNA molecules, genomic DNA molecules, cDNA molecules, mRNAmolecules, rRNA molecules, mtDNA, siRNA molecules, or any combinationthereof
 3. The method of claim 1, wherein the sample molecules comprisewhole transcriptome amplification (WTA) products.
 4. The method of claim1, wherein the one or more undesirable nucleic acid species compriserRNA, mtRNA, genomic DNA, intronic sequence, high-abundance sequence, orany combination thereof.
 5. The method of claim 1, wherein the one ormore undesirable nucleic acid species amount to about 50%, 60%, 70%, or80% of the nucleic acid content of the plurality of sample molecules. 6.The method of claim 1, comprising providing blocking oligonucleotidesthat specifically bind to two or more undesirable nucleic acid speciesin the plurality of sample molecules.
 7. The method of claim 1, whereinthe blocking oligonucleotide specifically binds to within 50 nt of the5′ end of the one or more undesirable nucleic acid species.
 8. Themethod of claim 1, wherein the blocking oligonucleotide is 10 nt to 50nt long.
 9. The method of claim 1, wherein the blocking oligonucleotidehas a T_(m) of at least 60° C.
 10. The method of claim 1, wherein theone or more amplification primers add sequencing adaptors to theplurality of extension products
 11. The method of claim 10, furthercomprising sequencing the plurality of amplicons, or products thereof.12. The method of claim 1, further comprising removing a hybridizedcomplex formed between the blocking oligonucleotide and the undesirablenucleic acid species, wherein the removing comprises immobilizing thehybridized complex formed between the blocking oligonucleotide and theundesirable nucleic acid species on a solid support, wherein theblocking oligonucleotide comprises an affinity moiety, and wherein thesolid support comprises a binding partner of the affinity moiety.
 13. Akit for selective amplification of a plurality of nucleic acid targetmolecules in a sample, comprising: one or more blocking oligonucleotidesthat specifically bind to at least one of one or more undesirablenucleic acid species in the sample within 100 nt of the 5′ end of theone or more undesirable nucleic acid species, wherein the one or moreblocking oligonucleotides are unable to function as a primer for areverse transcriptase or a polymerase.
 14. The kit of claim 13, furthercomprising a reverse transcriptase, a polymerase, a ligase, a nuclease,a plurality of particles each comprising a plurality of oligonucleotideprobes, or any combination thereof.
 15. The kit of claim 13, furthercomprising a solid support, wherein the one or more blockingoligonucleotides comprise an affinity moiety, wherein the solid supportcomprises a binding partner of the affinity moiety, and wherein theaffinity moiety comprises biotin, streptavidin, heparin, an aptamer, aclick-chemistry moiety, digoxigenin, primary amine(s), carboxyl(s),hydroxyl(s), aldehyde(s), ketone(s), or any combination thereof.
 16. Thekit of claim 13, wherein the undesirable nucleic acid species compriserRNA, mtRNA, genomic DNA, intronic sequence, high-abundance sequence, orany combination thereof.
 17. The kit of claim 13, wherein the one ormore blocking oligonucleotides specifically bind to two or moreundesirable nucleic acid species in the sample.
 18. The kit of claim 13,wherein the one or more blocking oligonucleotides comprise a lockednucleic acid (LNA), a peptide nucleic acid (PNA), a DNA, an LNA/PNAchimera, an LNA/DNA chimera, a PNA/DNA chimera, or any combinationthereof.
 19. The kit of claim 13, wherein the one or more blockingoligonucleotides specifically bind to within 50 nt of the 5′ end of theone or more undesirable nucleic acid species.
 20. The kit of claim 13,wherein the one or more blocking oligonucleotides has a T_(m) of atleast 60° C.